Twinkle Eval:高效且準確的 AI 評測工具

eval evaluation llm
2 Open Issues Need Help Last updated: Aug 6, 2025

Open Issues Need Help

View All on GitHub
AI/ML AI Model Evaluation

AI Summary: Debug a Twinkle Eval integration issue where using an API key generated from the NVIDIA GPT-OSS-20B example results in a 'NoneType' object is not subscriptable error. This involves analyzing the response format differences between the NVIDIA endpoint and Twinkle Eval's expected format, and potentially modifying Twinkle Eval's code to handle the NVIDIA response correctly.

Complexity: 4/5
bug help wanted

Twinkle Eval:高效且準確的 AI 評測工具

Python
#eval#evaluation#llm
AI/ML AI Model Evaluation

AI Summary: The task is to modify the Twinkle Eval tool to correctly handle the `reasoning` parameter when interacting with the Ollama LLM. Currently, it uses `reasoning_content`, which is incorrect for Ollama. A check should be added to differentiate between Ollama and other LLMs (like vLLM) to use the appropriate parameter name.

Complexity: 4/5
bug help wanted

Twinkle Eval:高效且準確的 AI 評測工具

Python
#eval#evaluation#llm