Open Issues Need Help
View All on GitHubAI Summary: The task requires implementing a novel LLM inference optimization technique: a real-time, reinforcement learning-trained hint injection system. This involves creating a new module within the existing optillm framework, integrating it with existing components (memory, router, CoT), and evaluating its performance on various benchmark tasks. The system will monitor the model's reasoning stream, retrieve relevant hints from a memory bank, inject them at optimal moments, and learn optimal injection strategies through reinforcement learning.
Optimizing inference proxy for LLMs
AI Summary: Implement a new plugin for the optillm inference proxy that performs k parallel inferences and selects the response that appears most frequently (majority vote). This will involve creating a new plugin module, integrating it into the existing proxy architecture, and adding appropriate configuration options. The plugin should be comparable to the existing 'best of n' approach.
Optimizing inference proxy for LLMs