aviralgarg05

aviralgarg05/agentunit

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

7 stars 15 forks 7 watchers Python MIT License

framework pypi-package pytest python

View on GitHub Website

40 Open Issues Need Help Last updated: Jan 23, 2026

Open Issues Need Help

View All on GitHub

Fix type annotation errors in llm_generator.py about 21 hours ago

AI Summary: Mypy is reporting missing type annotations for the `cases` list in both `LlamaDatasetGenerator` and `OpenAIDatasetGenerator` within `src/agentunit/generators/llm_generator.py`. The task is to add the correct type hints to these lists to resolve the errors.

Complexity: 1/5

good first issue

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add logging to LLM dataset generators 16 days ago

enhancement good first issue

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Implement K-Anonymity logic in PrivateDatasetWrapper 16 days ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add smoke test for suite_template FAQAdapter 17 days ago

help wanted good first issue tests

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add CSV Export for Test Results 23 days ago

AI Summary: The `SuiteResult` class, which currently supports JSON, Markdown, JUnit, and HTML exports, needs a new `to_csv(self, path)` method. This method should export test results to a CSV file, with the added requirement of flattening nested metrics into separate, distinct columns for improved spreadsheet analysis.

Complexity: 3/5

enhancement good first issue

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Refactor Scenario to Decouple Adapters 24 days ago

help wanted refactor architecture

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Improve Robustness of CSV Dataset Loader 24 days ago

enhancement good first issue data-handling

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Implement Real Execution for Coordination Tests 24 days ago

enhancement help wanted cli

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add Visualizations to Dashboard Reports 24 days ago

enhancement help wanted frontend

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Implement Regression Detection UI 24 days ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Implement Duration & Response Time Tracking in AG2Adapter 24 days ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Implement Duration Tracking in SwarmAdapter 24 days ago

enhancement good first issue

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Refactor datetime.utcnow usages to timezone-aware datetime 28 days ago

AI Summary: This issue proposes refactoring all usages of the deprecated `datetime.utcnow()` to its timezone-aware equivalent, `datetime.now(timezone.utc)`. The goal is to eliminate deprecation warnings and ensure accurate timestamps, while maintaining existing string formatting and adding a small unit test to verify the change.

Complexity: 1/5

help wanted good first issue chore

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add HTML report exporter with charts about 1 month ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Fix typos in docstrings about 1 month ago

documentation help wanted good first issue

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add sample scenario tests for suite_template.py about 2 months ago

help wanted good first issue tests

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add result caching for unchanged scenarios about 2 months ago

enhancement help wanted performance

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add missing type hints to DatasetCase about 2 months ago

enhancement help wanted good first issue

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Improve test coverage for metrics registry about 2 months ago

help wanted good first issue tests

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Create pytest plugin for scenario discovery about 2 months ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add LangGraph integration tests about 2 months ago

help wanted tests

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add basic evaluation example script about 2 months ago

documentation help wanted good first issue

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Support parallel scenario execution 2 months ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add dataset sampling strategies 2 months ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Create VS Code extension for AgentUnit configs 2 months ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add scenario tagging and filtering 2 months ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add streaming output for live evaluation progress 2 months ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Support custom metrics via YAML configuration 2 months ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add baseline comparison with regression detection 2 months ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add --watch mode for continuous evaluation 2 months ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add LLM cost tracking and budgeting 2 months ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Support async adapters 2 months ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add retry decorator with exponential backoff 2 months ago

enhancement help wanted

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add docstrings to BaseAdapter methods 2 months ago

documentation help wanted good first issue

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Expose __version__ in agentunit package 2 months ago

enhancement help wanted good first issue

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add py.typed marker for type checker support 2 months ago

enhancement help wanted good first issue

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add test for markdown report emoji encoding 2 months ago

help wanted good first issue tests

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Clarify local CI commands in README 2 months ago

documentation help wanted good first issue

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Pin datetime usage to timezone-aware helper 2 months ago

help wanted good first issue chore

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python

Add quickstart instructions for running CI checks locally 2 months ago

documentation help wanted good first issue

aviralgarg05/agentunit

7

AgentUnit is a pytest-inspired evaluation harness for autonomous agents and retrieval-augmented generation (RAG) workflows. It helps you describe repeatable scenarios, connect them to your agent stack, and score results with both heuristic and LLM-backed metrics.

Python

#framework#pypi-package#pytest#python