sMiNT0S/AIBugBench - 2 Help Wanted Issues

Open Issues Need Help

help wanted

From prompt to paste: evaluate AI / LLM output under a strict Python sandbox and get actionable scores across 7 categories, including security, correctness and upkeep.

Python

#ai#benchmark#benchmarking#benchmarking-suite#code-generation#code-security#evals#evaluation#generative-ai#llm#python#python3#reproducibility#risk-assessment#sandbox#security#static-analysis#testing

scripts/validate_docs.py refactor 5 months ago

bug good first issue

sMiNT0S/AIBugBench

From prompt to paste: evaluate AI / LLM output under a strict Python sandbox and get actionable scores across 7 categories, including security, correctness and upkeep.

Python

#ai#benchmark#benchmarking#benchmarking-suite#code-generation#code-security#evals#evaluation#generative-ai#llm#python#python3#reproducibility#risk-assessment#sandbox#security#static-analysis#testing