Running small but capable language models entirely offline

bitsandbytes chatbot edge-ai huggingface llm local-ai offline-ai quantization smollm3 speech-recognition voice-assistant vosk
1 Open Issue Need Help Last updated: Jul 28, 2025

Open Issues Need Help

View All on GitHub

AI Summary: Integrate either the exllamav2 or vLLM inference engine into the existing Offline-Voice-LLM-assistant project to significantly reduce the model's inference time (currently around 5 seconds per response). This involves adding the chosen engine's dependencies, modifying the code to utilize its functionalities, and testing the improved performance.

Complexity: 4/5
good first issue

Running small but capable language models entirely offline

Python
#bitsandbytes#chatbot#edge-ai#huggingface#llm#local-ai#offline-ai#quantization#smollm3#speech-recognition#voice-assistant#vosk