FluidInference/FluidAudio

Support streaming speaker diarization about 2 months ago

enhancement good first issue speaker-diarization

613

Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.

Swift

#ane#asr#audio#automatic-speech-recognition#avfoundation#coreml#ios#macos#nvidia#parakeet#real-time#speaker-diarization#speaker-embedding#speaker-identification#speaker-recognition#speech-to-text#swift#vad#voice-activity-detection

Load function ignores user specified computeUnits 3 months ago

AI Summary: The task is to debug and fix a bug in the `AsrModels.load` function within the FluidAudio Swift framework. The function currently ignores user-specified compute units (`configuration.computeUnits`) and instead always uses optimized compute units (`optimizedConfig.computeUnits`). The fix requires modifying the `load` function to correctly utilize the user-provided compute unit configuration.

Complexity: 3/5

bug good first issue

FluidInference/FluidAudio

613

Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.

Swift

#ane#asr#audio#automatic-speech-recognition#avfoundation#coreml#ios#macos#nvidia#parakeet#real-time#speaker-diarization#speaker-embedding#speaker-identification#speaker-recognition#speech-to-text#swift#vad#voice-activity-detection

iOs live ASR 3 months ago

AI Summary: The task is to provide a Swift code snippet demonstrating live Automatic Speech Recognition (ASR) from a microphone on iOS using the FluidAudio library. The snippet should handle audio input from the microphone, process it using the library's (currently missing) `transcribeChunk` function (or an equivalent), and output the transcription to the console. The solution needs to account for the user's inexperience with iOS development.

Complexity: 3/5

documentation good first issue

FluidInference/FluidAudio

613

Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.

Swift

#ane#asr#audio#automatic-speech-recognition#avfoundation#coreml#ios#macos#nvidia#parakeet#real-time#speaker-diarization#speaker-embedding#speaker-identification#speaker-recognition#speech-to-text#swift#vad#voice-activity-detection

Why does the VAD model download every time it initializes? 3 months ago

AI Summary: The task is to modify the FluidAudio Swift framework to prevent the repeated download of VAD models from HuggingFace upon each app launch. This involves integrating the pre-downloaded CoreML models directly into the Xcode project, eliminating the need for online retrieval.

Complexity: 3/5

bug good first issue

FluidInference/FluidAudio

613

Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.

Swift

#ane#asr#audio#automatic-speech-recognition#avfoundation#coreml#ios#macos#nvidia#parakeet#real-time#speaker-diarization#speaker-embedding#speaker-identification#speaker-recognition#speech-to-text#swift#vad#voice-activity-detection

Open Issues Need Help