Clinician-built benchmark and live leaderboard for medical AI safety evaluation.

ai-benchmark ai-safety clinical-nlp clinician-review evaluation-framework failure-analysis failure-atlas healthcare-ai huggingface-spaces language-model-evaluation llm-evaluation llm-safety medfailbench medical-ai medical-ai-safety medical-language-models medical-llm patient-safety source-verification turkish-medical-ai
2 Open Issues Need Help Last updated: Jul 3, 2026

Open Issues Need Help

View All on GitHub

Clinician-built benchmark and live leaderboard for medical AI safety evaluation.

Python
#ai-benchmark#ai-safety#clinical-nlp#clinician-review#evaluation-framework#failure-analysis#failure-atlas#healthcare-ai#huggingface-spaces#language-model-evaluation#llm-evaluation#llm-safety#medfailbench#medical-ai#medical-ai-safety#medical-language-models#medical-llm#patient-safety#source-verification#turkish-medical-ai
documentation good first issue

Clinician-built benchmark and live leaderboard for medical AI safety evaluation.

Python
#ai-benchmark#ai-safety#clinical-nlp#clinician-review#evaluation-framework#failure-analysis#failure-atlas#healthcare-ai#huggingface-spaces#language-model-evaluation#llm-evaluation#llm-safety#medfailbench#medical-ai#medical-ai-safety#medical-language-models#medical-llm#patient-safety#source-verification#turkish-medical-ai