Open Issues Need Help
View All on GitHub Add correctness tests to Yalis 2 months ago
enhancement help wanted
Optimise Yalis Engine Initialization Time 2 months ago
AI Summary: Optimize the YALIS LLM inference system's initialization time. This involves reducing both model loading time and the overhead of the first inference iteration, which is currently slow due to TorchDynamo/torch.compile warmup and CUDA graph recording. The goal is to improve the system's usability by significantly reducing startup time.
Complexity:
5/5
enhancement help wanted