Truly universal encoding detector in pure Python

chardet charset-conversion charset-detection encoding python unicode
3 Open Issues Need Help Last updated: Jun 17, 2025

Open Issues Need Help

View All on GitHub

AI Summary: Debug a bug in the charset-normalizer library where the detection of UTF-8 encoded text containing a Unicode character (▶) incorrectly identifies it as UTF-16LE. The task involves reproducing the bug using the provided Python code, investigating the cause of the misidentification, and potentially proposing a fix or improvement to the library's detection algorithm.

Complexity: 4/5
bug help wanted

Truly universal encoding detector in pure Python

Python
#chardet#charset-conversion#charset-detection#encoding#python#unicode

AI Summary: The task is to debug a bug in the `charset-normalizer` Python library where ISO-8859 encoded text is incorrectly detected as UTF-16-be. The solution requires investigating the library's detection algorithm, reproducing the bug with provided code examples, and potentially modifying the algorithm or adding exception handling to correctly identify ISO-8859 encodings.

Complexity: 4/5
bug help wanted

Truly universal encoding detector in pure Python

Python
#chardet#charset-conversion#charset-detection#encoding#python#unicode

AI Summary: Correct a typo in the `pyproject.toml` file of the `charset-normalizer` project. The typo is in the `scripts` section, where the path to the CLI script is incorrectly specified. The correct path needs to be substituted.

Complexity: 1/5
bug help wanted

Truly universal encoding detector in pure Python

Python
#chardet#charset-conversion#charset-detection#encoding#python#unicode