Open Issues Need Help
View All on GitHubTrain speculative decoding models effortlessly and port them smoothly to SGLang serving.
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
AI Summary: The task is to add support for speculative parallelism (SP) to the SpecForge framework, enabling efficient training of large context length models. This involves modifying the existing training scripts and potentially the model architecture to handle the increased computational demands of longer contexts and parallel processing.
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
AI Summary: The task is to modify the SpecForge framework to support Flex Attention in the draft model, specifically addressing the need for a more sophisticated attention mechanism to handle the attention mask requirements of the TTT method. This involves modifying the existing naive attention implementation within the framework's draft model to incorporate Flex Attention, potentially improving training speed.
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
AI Summary: The task is to augment the existing benchmark script for the SpecForge project to include the calculation and reporting of speedup ratios. This involves comparing the inference speed of models using EAGLE3 speculative decoding against models without it.
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.