πŸΎπŸ” The tool-call detective for small models on Apple Silicon β€” attributes every tool-calling failure to one of four causes (template bug / parser gap / model format / model decision). MLX-native, reproducible, bootstrap CIs on every metric.

0 stars 1 forks 0 watchers Python Apache License 2.0
apple-silicon benchmark function-calling llm-evaluation llm-tools local-llm mlx mlx-lm small-language-models tool-calling
5 Open Issues Need Help Last updated: Jul 3, 2026

Open Issues Need Help

View All on GitHub

πŸΎπŸ” The tool-call detective for small models on Apple Silicon β€” attributes every tool-calling failure to one of four causes (template bug / parser gap / model format / model decision). MLX-native, reproducible, bootstrap CIs on every metric.

Python
#apple-silicon#benchmark#function-calling#llm-evaluation#llm-tools#local-llm#mlx#mlx-lm#small-language-models#tool-calling

πŸΎπŸ” The tool-call detective for small models on Apple Silicon β€” attributes every tool-calling failure to one of four causes (template bug / parser gap / model format / model decision). MLX-native, reproducible, bootstrap CIs on every metric.

Python
#apple-silicon#benchmark#function-calling#llm-evaluation#llm-tools#local-llm#mlx#mlx-lm#small-language-models#tool-calling
enhancement help wanted

πŸΎπŸ” The tool-call detective for small models on Apple Silicon β€” attributes every tool-calling failure to one of four causes (template bug / parser gap / model format / model decision). MLX-native, reproducible, bootstrap CIs on every metric.

Python
#apple-silicon#benchmark#function-calling#llm-evaluation#llm-tools#local-llm#mlx#mlx-lm#small-language-models#tool-calling
documentation good first issue

πŸΎπŸ” The tool-call detective for small models on Apple Silicon β€” attributes every tool-calling failure to one of four causes (template bug / parser gap / model format / model decision). MLX-native, reproducible, bootstrap CIs on every metric.

Python
#apple-silicon#benchmark#function-calling#llm-evaluation#llm-tools#local-llm#mlx#mlx-lm#small-language-models#tool-calling
good first issue

πŸΎπŸ” The tool-call detective for small models on Apple Silicon β€” attributes every tool-calling failure to one of four causes (template bug / parser gap / model format / model decision). MLX-native, reproducible, bootstrap CIs on every metric.

Python
#apple-silicon#benchmark#function-calling#llm-evaluation#llm-tools#local-llm#mlx#mlx-lm#small-language-models#tool-calling