Semantic deduplication for multilingual training corpora — fast, offline, open

12 stars 0 forks 12 watchers Python Apache License 2.0
faiss multilingual nlp open-source python
2 Open Issues Need Help Last updated: Jul 5, 2026

Open Issues Need Help

View All on GitHub
Add incremental indexing about 3 hours ago
bug enhancement help wanted

Semantic deduplication for multilingual training corpora — fast, offline, open

Python
#faiss#multilingual#nlp#open-source#python
enhancement help wanted

Semantic deduplication for multilingual training corpora — fast, offline, open

Python
#faiss#multilingual#nlp#open-source#python