Open Issues Need Help
View All on GitHubAI Summary: The task is to train new Taiwanese Hokkien and Min Nan grapheme-to-phoneme models for the Goruut project, incorporating support for eight tones. This involves identifying the Hugging Face dataset previously used and utilizing updated Python code to train the models. The goal is to improve the accuracy and tonal representation of the phonemizer for these languages.
IPA Phonemizer/Dephonemizer for 139 human languages
AI Summary: Implement a feature in the Goruut project that allows users to specify the path to a custom pronunciation model, enabling the use of larger models that are not included in the repository. This addresses the issue of large models like the Hebrew model being too large to be hosted within the project.
IPA Phonemizer/Dephonemizer for 139 human languages
AI Summary: The task involves improving the English language support in the Goruut IPA phonemizer/dephonemizer, specifically addressing issues with Google homograph disambiguation. This likely requires training or fine-tuning the existing grapheme-to-phoneme model on a larger and more diverse English dataset, potentially focusing on resolving ambiguities between words with identical spellings but different pronunciations.
IPA Phonemizer/Dephonemizer for 139 human languages
IPA Phonemizer/Dephonemizer for 139 human languages
AI Summary: The task involves evaluating the efficiency of adding a tag predictor model to an existing IPA phonemizer. The decision hinges on whether the compressed size of the dictionary with the added tag words and the tag predictor model is smaller than the compressed size of the dictionary without the tag words. This requires comparing the zipped sizes of different dictionary versions.
IPA Phonemizer/Dephonemizer for 139 human languages