Text Anonymizer

Anonymize sensitive data in text

Understanding Text Anonymization

TL;DR

Text anonymization detects and masks PII — emails, phone numbers, IBANs, credit cards, IPs — to protect privacy and comply with GDPR.

What is Text Anonymization?

Text anonymization is the process of detecting and removing or masking personally identifiable information (PII) from text data. The goal is to make it impossible to identify specific individuals from the processed text, while preserving the text’s utility for analysis, sharing, or publication.

PII includes any information that can directly or indirectly identify a person: names, email addresses, phone numbers, bank account numbers (IBANs), credit card numbers, IP addresses, social security numbers, and physical addresses. Even seemingly innocuous data points can become PII when combined — a birth date, ZIP code, and gender can uniquely identify 87% of the US population.

Anonymization is distinct from encryption. Encrypted data can be reversed with the right key. Anonymized data is permanently transformed — the original PII is replaced or removed, and there is no key to recover it.

Anonymization vs Pseudonymization

These terms are often confused, but the distinction has significant legal implications under GDPR:

Anonymization irreversibly removes the link between data and the individual. No one — not even the data controller — can re-identify the person. Anonymized data is no longer personal data under GDPR.

Pseudonymization replaces identifying information with artificial identifiers (pseudonyms) while maintaining a mapping table. The data can be re-identified by someone with access to the mapping. Pseudonymized data is still personal data under GDPR.

Aspect	Anonymization	Pseudonymization
Reversible	No	Yes (with mapping)
GDPR status	Not personal data	Still personal data
Data utility	Lower (info lost)	Higher (structure preserved)
Risk	No re-identification	Re-identification possible
Use case	Public datasets, research	Internal processing, analytics

In practice, true anonymization is difficult to achieve. Researchers have repeatedly demonstrated that supposedly anonymized datasets can be re-identified through cross-referencing with other data sources.

PII Detection Methods

Text anonymizers use several techniques to identify PII in unstructured text:

PII Type	Detection Method	Mask Format	Example
Email	Regex: `[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`	`[EMAIL]`	`john@example.com`
Phone	Regex: international patterns with country codes	`[PHONE]`	`+33 6 12 34 56 78`
IBAN	Regex + MOD-97 validation	`[IBAN]`	`FR7630006000011234567890189`
Credit Card	Regex + Luhn checksum	`[CREDIT_CARD]`	`4111 1111 1111 1111`
IPv4	Regex: `\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}`	`[IP]`	`192.168.1.1`
IPv6	Regex: hex groups with colons	`[IP]`	`2001:db8::1`

Beyond Regex: Named Entity Recognition

For names, addresses, and other free-form PII, regex alone is insufficient. Advanced anonymization tools use NER (Named Entity Recognition) — machine learning models trained to identify entities like person names, organizations, and locations in text.

NER models (spaCy, Presidio, Hugging Face transformers) can detect PII that follows no fixed pattern — “Dr. Sarah Chen” or “42 Rue de Rivoli, Paris” — but they require more computational resources and can produce false positives (flagging “Paris” as PII when it refers to the city, not a person).

Client-side anonymization tools typically rely on regex patterns for performance and privacy. For maximum coverage, combine automated detection with manual review.

Two major regulations drive the demand for text anonymization:

GDPR (General Data Protection Regulation, EU, 2018) requires a lawful basis for processing personal data, grants individuals rights over their data (access, erasure, portability), and imposes severe penalties for non-compliance. Anonymization is a key strategy for GDPR compliance — once data is truly anonymized, it no longer falls under GDPR jurisdiction.

CCPA (California Consumer Privacy Act, 2020) gives California residents rights over their personal information, including the right to know what data is collected, the right to delete it, and the right to opt out of its sale. Like GDPR, anonymized data is exempt.

Both regulations emphasize that anonymization must be irreversible. If there is any reasonable means of re-identifying the data subjects, the data is pseudonymized, not anonymized, and remains subject to the regulation.

Common Use Cases

Sharing logs and tickets: Anonymize support tickets, error logs, and bug reports before sharing with external vendors or posting on public forums
Test data generation: Mask PII in production data to create realistic but safe test datasets for development and QA environments
Research and analytics: Anonymize customer feedback, survey responses, and transaction records for aggregate analysis without privacy concerns
Regulatory compliance: Demonstrate GDPR/CCPA compliance by showing that shared or published data contains no PII
LLM prompt safety: Remove PII from text before sending it to external AI models to prevent personal data from entering training datasets

Try These Examples

Text with Multiple PII Types Valid

This text contains four types of PII: an email address, a phone number, an IBAN, and an IP address. The anonymizer detects and masks each one, producing: 'Contact John at [EMAIL] or [PHONE]. IBAN: [IBAN]. IP: [IP]'.

Contact John at john.doe@example.com or +33 6 12 34 56 78. IBAN: FR7630006000011234567890189. IP: 192.168.1.42

Text with No PII Valid

This text contains no personally identifiable information. The anonymizer returns it unchanged. Numbers like '15%' are not flagged because they don't match PII patterns.

The quarterly report shows a 15% increase in revenue compared to last year.

Text Anonymizer

Anonymization Mode

PII Detection

Custom Expression

What is Text Anonymization?

Anonymization vs Pseudonymization

PII Detection Methods

Beyond Regex: Named Entity Recognition

Common Use Cases

Try These Examples

Text Anonymizer

Anonymization Mode

PII Detection

Custom Expression

What is Text Anonymization?

Anonymization vs Pseudonymization

PII Detection Methods

Beyond Regex: Named Entity Recognition

GDPR and CCPA Requirements

Common Use Cases

Try These Examples