llm-d benchmark scripts and tooling

incubating
7 Open Issues Need Help Last updated: Jul 24, 2025

Open Issues Need Help

View All on GitHub

llm-d benchmark scripts and tooling

Shell
#incubating
help wanted EPIC

llm-d benchmark scripts and tooling

Shell
#incubating

AI Summary: The task is to adapt the `llm-d-benchmark` project's run component to convert the output of all supported harnesses (fmperf, inference-perf, guidellm, vllm benchmarks) into a standardized 'universal format' as defined in issue #185, following the example in issue #205. This ensures consistent reporting across different benchmarking tools.

Complexity: 4/5
help wanted

llm-d benchmark scripts and tooling

Shell
#incubating

AI Summary: The task is to adapt the `guidellm` harness within the `llm-d-benchmark` project to output benchmark results in a standardized "universal format" as defined in issue #185. This involves modifying the `guidellm` code to match the conversion example provided in issue #205, ensuring compatibility with the project's reporting system.

Complexity: 4/5
help wanted

llm-d benchmark scripts and tooling

Shell
#incubating

AI Summary: The task is to modify the LLM-d benchmark setup to convert environment variables used for deployment parameters into a YAML structure, aligning with a defined universal format specified in issue #185. This involves updating the setup/standup component of the benchmarking system to handle this new YAML-based configuration.

Complexity: 3/5
help wanted

llm-d benchmark scripts and tooling

Shell
#incubating

AI Summary: The `run.sh` script in the `llm-d-benchmark` project needs to be fixed to properly handle the `-n` or `--dry-run` flag. Currently, even in dry-run mode, it attempts to interact with the cluster and reports an error if resources are not found, which is unexpected behavior for a dry run. The fix involves modifying the script to prevent actual cluster interactions during dry runs and only print the commands that would have been executed.

Complexity: 4/5
good first issue help wanted

llm-d benchmark scripts and tooling

Shell
#incubating

AI Summary: Extend the existing CI/CD pipeline to include testing for all supported harnesses (inference-perf, fmperf, vllm-benchmark, guidellm), ensuring that data collection works correctly for each.

Complexity: 4/5
help wanted

llm-d benchmark scripts and tooling

Shell
#incubating