Frontier LLMs collapse past ~22 reasoning steps on state-space search, but tool delegation stays near-perfect. Benchmark, theory and code behind the Deterministic Horizon (ICML 2026).

ai-agents benchmark chain-of-thought evaluation large-language-models llm machine-learning reasoning tool-use
2 Open Issues Need Help Last updated: Jul 3, 2026

Open Issues Need Help

View All on GitHub

Frontier LLMs collapse past ~22 reasoning steps on state-space search, but tool delegation stays near-perfect. Benchmark, theory and code behind the Deterministic Horizon (ICML 2026).

Python
#ai-agents#benchmark#chain-of-thought#evaluation#large-language-models#llm#machine-learning#reasoning#tool-use
documentation good first issue

Frontier LLMs collapse past ~22 reasoning steps on state-space search, but tool delegation stays near-perfect. Benchmark, theory and code behind the Deterministic Horizon (ICML 2026).

Python
#ai-agents#benchmark#chain-of-thought#evaluation#large-language-models#llm#machine-learning#reasoning#tool-use