IPA Phonemizer/Dephonemizer for 139 human languages

g2p ipa p2g phonemizer phonetic-algorithm phonetic-conversion
5 Open Issues Need Help Last updated: Jul 10, 2025

Open Issues Need Help

View All on GitHub

AI Summary: The task is to train new Taiwanese Hokkien and Min Nan grapheme-to-phoneme models for the Goruut project, incorporating support for eight tones. This involves identifying the Hugging Face dataset previously used and utilizing updated Python code to train the models. The goal is to improve the accuracy and tonal representation of the phonemizer for these languages.

Complexity: 4/5
help wanted dataset

IPA Phonemizer/Dephonemizer for 139 human languages

Go
#g2p#ipa#p2g#phonemizer#phonetic-algorithm#phonetic-conversion

AI Summary: Implement a feature in the Goruut project that allows users to specify the path to a custom pronunciation model, enabling the use of larger models that are not included in the repository. This addresses the issue of large models like the Hebrew model being too large to be hosted within the project.

Complexity: 4/5
enhancement help wanted backend

IPA Phonemizer/Dephonemizer for 139 human languages

Go
#g2p#ipa#p2g#phonemizer#phonetic-algorithm#phonetic-conversion
Train english about 2 months ago

AI Summary: The task involves improving the English language support in the Goruut IPA phonemizer/dephonemizer, specifically addressing issues with Google homograph disambiguation. This likely requires training or fine-tuning the existing grapheme-to-phoneme model on a larger and more diverse English dataset, potentially focusing on resolving ambiguities between words with identical spellings but different pronunciations.

Complexity: 4/5
help wanted dataset

IPA Phonemizer/Dephonemizer for 139 human languages

Go
#g2p#ipa#p2g#phonemizer#phonetic-algorithm#phonetic-conversion
Train hebrew3 about 2 months ago
help wanted dataset

IPA Phonemizer/Dephonemizer for 139 human languages

Go
#g2p#ipa#p2g#phonemizer#phonetic-algorithm#phonetic-conversion
tag predictor model about 2 months ago

AI Summary: The task involves evaluating the efficiency of adding a tag predictor model to an existing IPA phonemizer. The decision hinges on whether the compressed size of the dictionary with the added tag words and the tag predictor model is smaller than the compressed size of the dictionary without the tag words. This requires comparing the zipped sizes of different dictionary versions.

Complexity: 4/5
enhancement good first issue analysis2

IPA Phonemizer/Dephonemizer for 139 human languages

Go
#g2p#ipa#p2g#phonemizer#phonetic-algorithm#phonetic-conversion