Text Anonymizer
Anonymize sensitive data in text
Anonymization Mode
PII Detection
Custom Expression
Test your regex → open_in_newText anonymization detects and masks PII — emails, phone numbers, IBANs, credit cards, IPs — to protect privacy and comply with GDPR.
What is Text Anonymization?
Text anonymization is the process of detecting and removing or masking personally identifiable information (PII) from text data. The goal is to make it impossible to identify specific individuals from the processed text, while preserving the text’s utility for analysis, sharing, or publication.
PII includes any information that can directly or indirectly identify a person: names, email addresses, phone numbers, bank account numbers (IBANs), credit card numbers, IP addresses, social security numbers, and physical addresses. Even seemingly innocuous data points can become PII when combined — a birth date, ZIP code, and gender can uniquely identify 87% of the US population.
Anonymization is distinct from encryption. Encrypted data can be reversed with the right key. Anonymized data is permanently transformed — the original PII is replaced or removed, and there is no key to recover it.
Anonymization vs Pseudonymization
These terms are often confused, but the distinction has significant legal implications under GDPR:
Anonymization irreversibly removes the link between data and the individual. No one — not even the data controller — can re-identify the person. Anonymized data is no longer personal data under GDPR.
Pseudonymization replaces identifying information with artificial identifiers (pseudonyms) while maintaining a mapping table. The data can be re-identified by someone with access to the mapping. Pseudonymized data is still personal data under GDPR.
| Aspect | Anonymization | Pseudonymization |
|---|---|---|
| Reversible | No | Yes (with mapping) |
| GDPR status | Not personal data | Still personal data |
| Data utility | Lower (info lost) | Higher (structure preserved) |
| Risk | No re-identification | Re-identification possible |
| Use case | Public datasets, research | Internal processing, analytics |
In practice, true anonymization is difficult to achieve. Researchers have repeatedly demonstrated that supposedly anonymized datasets can be re-identified through cross-referencing with other data sources.
PII Detection Methods
Text anonymizers use several techniques to identify PII in unstructured text:
| PII Type | Detection Method | Mask Format | Example |
|---|---|---|---|
Regex: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} | [EMAIL] | john@example.com | |
| Phone | Regex: international patterns with country codes | [PHONE] | +33 6 12 34 56 78 |
| IBAN | Regex + MOD-97 validation | [IBAN] | FR7630006000011234567890189 |
| Credit Card | Regex + Luhn checksum | [CREDIT_CARD] | 4111 1111 1111 1111 |
| IPv4 | Regex: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} | [IP] | 192.168.1.1 |
| IPv6 | Regex: hex groups with colons | [IP] | 2001:db8::1 |
Beyond Regex: Named Entity Recognition
For names, addresses, and other free-form PII, regex alone is insufficient. Advanced anonymization tools use NER (Named Entity Recognition) — machine learning models trained to identify entities like person names, organizations, and locations in text.
NER models (spaCy, Presidio, Hugging Face transformers) can detect PII that follows no fixed pattern — “Dr. Sarah Chen” or “42 Rue de Rivoli, Paris” — but they require more computational resources and can produce false positives (flagging “Paris” as PII when it refers to the city, not a person).
Client-side anonymization tools typically rely on regex patterns for performance and privacy. For maximum coverage, combine automated detection with manual review.
GDPR and CCPA Requirements
Two major regulations drive the demand for text anonymization:
GDPR (General Data Protection Regulation, EU, 2018) requires a lawful basis for processing personal data, grants individuals rights over their data (access, erasure, portability), and imposes severe penalties for non-compliance. Anonymization is a key strategy for GDPR compliance — once data is truly anonymized, it no longer falls under GDPR jurisdiction.
CCPA (California Consumer Privacy Act, 2020) gives California residents rights over their personal information, including the right to know what data is collected, the right to delete it, and the right to opt out of its sale. Like GDPR, anonymized data is exempt.
Both regulations emphasize that anonymization must be irreversible. If there is any reasonable means of re-identifying the data subjects, the data is pseudonymized, not anonymized, and remains subject to the regulation.
Common Use Cases
- Sharing logs and tickets: Anonymize support tickets, error logs, and bug reports before sharing with external vendors or posting on public forums
- Test data generation: Mask PII in production data to create realistic but safe test datasets for development and QA environments
- Research and analytics: Anonymize customer feedback, survey responses, and transaction records for aggregate analysis without privacy concerns
- Regulatory compliance: Demonstrate GDPR/CCPA compliance by showing that shared or published data contains no PII
- LLM prompt safety: Remove PII from text before sending it to external AI models to prevent personal data from entering training datasets
Try These Examples
This text contains four types of PII: an email address, a phone number, an IBAN, and an IP address. The anonymizer detects and masks each one, producing: 'Contact John at [EMAIL] or [PHONE]. IBAN: [IBAN]. IP: [IP]'.
Contact John at john.doe@example.com or +33 6 12 34 56 78. IBAN: FR7630006000011234567890189. IP: 192.168.1.42 This text contains no personally identifiable information. The anonymizer returns it unchanged. Numbers like '15%' are not flagged because they don't match PII patterns.
The quarterly report shows a 15% increase in revenue compared to last year.