pytest for AI agents — test safety, accuracy, tool use, and cost. No YAML, no telemetry, just Python.

agents ai ai-safety anthropic compliance developer-tools eval evaluation llm machine-learning openai pytest python safety testing
3 Open Issues Need Help Last updated: Mar 16, 2026

Open Issues Need Help

View All on GitHub
AI/ML AI Chatbots & Agents

AI Summary: This issue proposes adding a new assertion, `.grounded(context)`, to a testing framework. This assertion will verify that a language model's response is factually supported by a given context, preventing the model from 'hallucinating' information not present in the source material. The goal is to ensure model outputs are reliable and directly derived from provided data.

Complexity: 2/5
enhancement good first issue

pytest for AI agents — test safety, accuracy, tool use, and cost. No YAML, no telemetry, just Python.

Python
#agents#ai#ai-safety#anthropic#compliance#developer-tools#eval#evaluation#llm#machine-learning#openai#pytest#python#safety#testing
AI/ML AI Chatbots & Agents

AI Summary: This issue proposes adding support for Azure OpenAI endpoints as a new provider option. This would involve accepting specific environment variables for Azure, constructing the correct base URL, and including the API version parameter. This enhancement is aimed at enterprise users who leverage Azure for their OpenAI deployments.

Complexity: 2/5
enhancement good first issue

pytest for AI agents — test safety, accuracy, tool use, and cost. No YAML, no telemetry, just Python.

Python
#agents#ai#ai-safety#anthropic#compliance#developer-tools#eval#evaluation#llm#machine-learning#openai#pytest#python#safety#testing
AI/ML AI Chatbots & Agents

AI Summary: This issue proposes adding a new 'bias detection' prompt pack to the proofagent tool. This pack would include prompts designed to test AI models for biased responses related to race, gender, and age, ensuring equitable outputs. The implementation involves adding a new list to `packs.py` and registering it, with a decision to be made on the assertion method.

Complexity: 2/5
enhancement good first issue

pytest for AI agents — test safety, accuracy, tool use, and cost. No YAML, no telemetry, just Python.

Python
#agents#ai#ai-safety#anthropic#compliance#developer-tools#eval#evaluation#llm#machine-learning#openai#pytest#python#safety#testing