Code-byte404/toolhound

Add a zero-training method to the methods/ framework (e.g. TSCG) about 2 hours ago

enhancement help wanted

0

🐾🔍 The tool-call detective for small models on Apple Silicon — attributes every tool-calling failure to one of four causes (template bug / parser gap / model format / model decision). MLX-native, reproducible, bootstrap CIs on every metric.

Python

#apple-silicon#benchmark#function-calling#llm-evaluation#llm-tools#local-llm#mlx#mlx-lm#small-language-models#tool-calling

Implement the constrained-decoding seam: backend.generate(grammar=...) about 2 hours ago

enhancement help wanted

Code-byte404/toolhound

0

🐾🔍 The tool-call detective for small models on Apple Silicon — attributes every tool-calling failure to one of four causes (template bug / parser gap / model format / model decision). MLX-native, reproducible, bootstrap CIs on every metric.

Python

#apple-silicon#benchmark#function-calling#llm-evaluation#llm-tools#local-llm#mlx#mlx-lm#small-language-models#tool-calling

Add a model to the registry (and file the bugs it surfaces) about 2 hours ago

enhancement help wanted

Code-byte404/toolhound

0

🐾🔍 The tool-call detective for small models on Apple Silicon — attributes every tool-calling failure to one of four causes (template bug / parser gap / model format / model decision). MLX-native, reproducible, bootstrap CIs on every metric.

Python

#apple-silicon#benchmark#function-calling#llm-evaluation#llm-tools#local-llm#mlx#mlx-lm#small-language-models#tool-calling

Docs: add an English methodology page (docs/methodology.md) about 2 hours ago

documentation good first issue

Code-byte404/toolhound

0

🐾🔍 The tool-call detective for small models on Apple Silicon — attributes every tool-calling failure to one of four causes (template bug / parser gap / model format / model decision). MLX-native, reproducible, bootstrap CIs on every metric.

Python

#apple-silicon#benchmark#function-calling#llm-evaluation#llm-tools#local-llm#mlx#mlx-lm#small-language-models#tool-calling

Add abstention-trap test cases (C5/C6) about 2 hours ago

good first issue

Code-byte404/toolhound

0

🐾🔍 The tool-call detective for small models on Apple Silicon — attributes every tool-calling failure to one of four causes (template bug / parser gap / model format / model decision). MLX-native, reproducible, bootstrap CIs on every metric.

Python

#apple-silicon#benchmark#function-calling#llm-evaluation#llm-tools#local-llm#mlx#mlx-lm#small-language-models#tool-calling

Open Issues Need Help