Vitgracer/Offline-Voice-LLM-Assistant

Running small but capable language models entirely offline

bitsandbytes chatbot edge-ai huggingface llm local-ai offline-ai quantization smollm3 speech-recognition voice-assistant vosk

View on GitHub

1 Open Issue Need Help Last updated: Jul 28, 2025

Open Issues Need Help

View All on GitHub

Add vllm OR excllama optimization 3 months ago

AI Summary: Integrate either the exllamav2 or vLLM inference engine into the existing Offline-Voice-LLM-assistant project to significantly reduce the model's inference time (currently around 5 seconds per response). This involves adding the chosen engine's dependencies, modifying the code to utilize its functionalities, and testing the improved performance.

Complexity: 4/5

good first issue

Vitgracer/Offline-Voice-LLM-Assistant

Running small but capable language models entirely offline

Python

#bitsandbytes#chatbot#edge-ai#huggingface#llm#local-ai#offline-ai#quantization#smollm3#speech-recognition#voice-assistant#vosk