Open Issues Need Help
View All on GitHubAn open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
AI Summary: The `InAadhaarRecognizer` in `presidio-analyzer` fails to identify Indian Aadhaar card numbers when they are formatted with hyphens (`xxxx-xxxx-xxxx`) or spaces (`xxxx xxxx xxxx`). It currently only recognizes the plain 12-digit format (`xxxxxxxxxxxx`), despite the other formats being standard representations. The issue requires updating the recognizer's pattern to include these common variations.
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
AI Summary: Enhance Presidio's documentation to provide detailed instructions on building custom Docker images, specifically addressing multilingual support. This includes explaining which YAML files to modify, common pitfalls (like memory limitations when adding many languages), and troubleshooting warnings such as those related to missing NLP recognizers.
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
AI Summary: Implement input validation in the Presidio's `NlpEngineProvider` to check the validity of `nlp_engines`, `conf_file`, and `nlp_configuration` arguments. This involves adding checks to ensure correct data types, file existence, and format compliance, raising clear error messages when invalid input is detected.
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
AI Summary: Organize the growing list of predefined PII recognizers in the Presidio framework into subfolders based on geographic location (country-specific), global applicability, and underlying NER technology used. The solution must maintain backward compatibility.
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.