An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

anonymization data-anonymization data-masking data-obfuscation data-privacy data-redaction de-identification guardrails image-redactor named-entity-recognition nlp personally-identifiable-information phi pii pii-detection privacy python sensitive-data spacy transformers
5 Open Issues Need Help Last updated: Sep 14, 2025

Open Issues Need Help

View All on GitHub
bug good first issue analyzer

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

Python
#anonymization#data-anonymization#data-masking#data-obfuscation#data-privacy#data-redaction#de-identification#guardrails#image-redactor#named-entity-recognition#nlp#personally-identifiable-information#phi#pii#pii-detection#privacy#python#sensitive-data#spacy#transformers

AI Summary: The `InAadhaarRecognizer` in `presidio-analyzer` fails to identify Indian Aadhaar card numbers when they are formatted with hyphens (`xxxx-xxxx-xxxx`) or spaces (`xxxx xxxx xxxx`). It currently only recognizes the plain 12-digit format (`xxxxxxxxxxxx`), despite the other formats being standard representations. The issue requires updating the recognizer's pattern to include these common variations.

Complexity: 2/5
good first issue analyzer

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

Python
#anonymization#data-anonymization#data-masking#data-obfuscation#data-privacy#data-redaction#de-identification#guardrails#image-redactor#named-entity-recognition#nlp#personally-identifiable-information#phi#pii#pii-detection#privacy#python#sensitive-data#spacy#transformers

AI Summary: Enhance Presidio's documentation to provide detailed instructions on building custom Docker images, specifically addressing multilingual support. This includes explaining which YAML files to modify, common pitfalls (like memory limitations when adding many languages), and troubleshooting warnings such as those related to missing NLP recognizers.

Complexity: 4/5
good first issue documentation

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

Python
#anonymization#data-anonymization#data-masking#data-obfuscation#data-privacy#data-redaction#de-identification#guardrails#image-redactor#named-entity-recognition#nlp#personally-identifiable-information#phi#pii#pii-detection#privacy#python#sensitive-data#spacy#transformers

AI Summary: Implement input validation in the Presidio's `NlpEngineProvider` to check the validity of `nlp_engines`, `conf_file`, and `nlp_configuration` arguments. This involves adding checks to ensure correct data types, file existence, and format compliance, raising clear error messages when invalid input is detected.

Complexity: 3/5
bug good first issue analyzer

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

Python
#anonymization#data-anonymization#data-masking#data-obfuscation#data-privacy#data-redaction#de-identification#guardrails#image-redactor#named-entity-recognition#nlp#personally-identifiable-information#phi#pii#pii-detection#privacy#python#sensitive-data#spacy#transformers

AI Summary: Organize the growing list of predefined PII recognizers in the Presidio framework into subfolders based on geographic location (country-specific), global applicability, and underlying NER technology used. The solution must maintain backward compatibility.

Complexity: 4/5
enhancement good first issue analyzer

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

Python
#anonymization#data-anonymization#data-masking#data-obfuscation#data-privacy#data-redaction#de-identification#guardrails#image-redactor#named-entity-recognition#nlp#personally-identifiable-information#phi#pii#pii-detection#privacy#python#sensitive-data#spacy#transformers