Requirements-to-Running-Code benchmark for AI/LLM systems and frameworks—builds, runs, and auto-scores apps across functional and non-functional metrics.

ai automatic-evaluation benchmark code-quality devops docker e2e-testing integration-testing kubernetes lighthouse llm mutation-testing opentelemetry owasp reproducibility requirements-engineering security semgrep software-engineering testing
6 Open Issues Need Help Last updated: Sep 8, 2025

Open Issues Need Help

View All on GitHub

AI Summary: This GitHub issue aims to achieve complete English/Japanese bilingual support for all Req2Run benchmark documentation. While four key files are already translated, over 56 additional documentation files, including main reports, specifications, and detailed guides, currently exist only in English and need to be translated into Japanese.

Complexity: 4/5
documentation good first issue help wanted

Requirements-to-Running-Code benchmark for AI/LLM systems and frameworks—builds, runs, and auto-scores apps across functional and non-functional metrics.

Python
#ai#automatic-evaluation#benchmark#code-quality#devops#docker#e2e-testing#integration-testing#kubernetes#lighthouse#llm#mutation-testing#opentelemetry#owasp#reproducibility#requirements-engineering#security#semgrep#software-engineering#testing
enhancement good first issue help wanted

Requirements-to-Running-Code benchmark for AI/LLM systems and frameworks—builds, runs, and auto-scores apps across functional and non-functional metrics.

Python
#ai#automatic-evaluation#benchmark#code-quality#devops#docker#e2e-testing#integration-testing#kubernetes#lighthouse#llm#mutation-testing#opentelemetry#owasp#reproducibility#requirements-engineering#security#semgrep#software-engineering#testing
enhancement help wanted

Requirements-to-Running-Code benchmark for AI/LLM systems and frameworks—builds, runs, and auto-scores apps across functional and non-functional metrics.

Python
#ai#automatic-evaluation#benchmark#code-quality#devops#docker#e2e-testing#integration-testing#kubernetes#lighthouse#llm#mutation-testing#opentelemetry#owasp#reproducibility#requirements-engineering#security#semgrep#software-engineering#testing

AI Summary: This issue describes the implementation of a real-time log aggregation pipeline. The system needs to ingest logs from various sources and formats, perform real-time processing including filtering, parsing, and advanced aggregations, store the data efficiently for time-series queries and full-text search, and provide rule-based and anomaly detection alerting capabilities.

Complexity: 5/5
enhancement good first issue help wanted

Requirements-to-Running-Code benchmark for AI/LLM systems and frameworks—builds, runs, and auto-scores apps across functional and non-functional metrics.

Python
#ai#automatic-evaluation#benchmark#code-quality#devops#docker#e2e-testing#integration-testing#kubernetes#lighthouse#llm#mutation-testing#opentelemetry#owasp#reproducibility#requirements-engineering#security#semgrep#software-engineering#testing

AI Summary: This issue requests the creation of a baseline implementation for NET-001, a custom binary message protocol server over TCP. The server needs to handle message framing, multiple message types, connection management, and concurrent clients, with a specified file structure for the implementation.

Complexity: 4/5
enhancement help wanted

Requirements-to-Running-Code benchmark for AI/LLM systems and frameworks—builds, runs, and auto-scores apps across functional and non-functional metrics.

Python
#ai#automatic-evaluation#benchmark#code-quality#devops#docker#e2e-testing#integration-testing#kubernetes#lighthouse#llm#mutation-testing#opentelemetry#owasp#reproducibility#requirements-engineering#security#semgrep#software-engineering#testing

AI Summary: This issue requests the creation of a baseline implementation for CRYPTO-001, an AES-256-GCM file encryption tool. It requires implementing features like PBKDF2 key derivation, secure key generation, and metadata preservation, along with a CLI and comprehensive testing, following a detailed project structure.

Complexity: 4/5
enhancement help wanted

Requirements-to-Running-Code benchmark for AI/LLM systems and frameworks—builds, runs, and auto-scores apps across functional and non-functional metrics.

Python
#ai#automatic-evaluation#benchmark#code-quality#devops#docker#e2e-testing#integration-testing#kubernetes#lighthouse#llm#mutation-testing#opentelemetry#owasp#reproducibility#requirements-engineering#security#semgrep#software-engineering#testing