IP Text Extractor
Extract IP addresses from free text
IP text extraction scans unstructured text (logs, configs, emails) and identifies all IPv4 and IPv6 addresses using regex pattern matching.
What is IP Extraction?
IP text extraction is the process of scanning unstructured text — log files, configuration dumps, emails, incident reports, or any freeform document — and identifying all IPv4 and IPv6 addresses embedded within it. The extracted addresses can then be validated, deduplicated, sorted, or fed into other tools for further analysis.
This is a common task in security operations, system administration, and data analysis. When you receive a 10,000-line access log or a lengthy incident report, manually spotting every IP address is impractical. An extraction tool automates this in milliseconds, producing a clean list of all addresses found in the text.
The core technique is regular expression (regex) pattern matching. The tool applies carefully crafted regex patterns that match the structural format of IP addresses, then returns all matches with their positions in the text.
Regex Patterns for IPv4 and IPv6
IPv4 Pattern
A basic IPv4 regex matches four groups of 1-3 digits separated by dots:
\b(\d{1,3}\.){3}\d{1,3}\b
This pattern catches strings like 192.168.1.1, 10.0.0.255, and 0.0.0.0. The \b word boundary anchors ensure the match does not capture partial strings embedded in longer numbers.
A stricter pattern validates the octet range (0-255) within the regex itself:
\b(25[0-5]|2[0-4]\d|[01]?\d\d?)(\.(25[0-5]|2[0-4]\d|[01]?\d\d?)){3}\b
This rejects strings like 999.999.999.999 at the regex level rather than in a separate validation step.
IPv6 Pattern
IPv6 extraction is significantly more complex because of the multiple valid representations:
- Full form:
2001:0db8:85a3:0000:0000:8a2e:0370:7334 - Compressed:
2001:db8:85a3::8a2e:370:7334 - Mixed (IPv4-mapped):
::ffff:192.168.1.1 - Loopback:
::1
A comprehensive IPv6 regex must account for all these variants, making it considerably longer than the IPv4 pattern. Most extraction tools use a combination of patterns or a multi-pass approach: first extract candidates with a permissive pattern, then validate the structure in a second pass.
False Positives and How to Handle Them
The biggest challenge in IP extraction is false positives — strings that match the regex pattern but are not actual IP addresses. The most common offenders include:
- Software version numbers:
OpenSSL 1.1.1.4,Python 3.9.7.1,v2.0.0.1 - Document section numbers:
Section 1.2.3.4,Figure 10.0.0.1 - Decimal-heavy data: Sensor readings, financial figures, or coordinates that happen to have four dot-separated groups
- Partial matches: Longer numeric strings where a substring happens to match (e.g., a phone number or serial number)
Strategies to reduce false positives:
- Range validation: After extraction, verify that each octet falls within 0-255. This eliminates matches like
999.1.2.3. - Context analysis: Check the characters before and after the match. IP addresses in logs typically follow predictable patterns (start of line, after whitespace, inside brackets).
- Deduplication and frequency: In a large log file, real IP addresses tend to appear multiple times. A “version number” like 1.2.3.4 appearing once is likely not an IP.
- Known-range filtering: Filter results against known private, loopback, and reserved ranges depending on whether you expect to see them.
Common Use Cases
- Log analysis: Extracting client IPs from web server access logs (Apache, Nginx, IIS) for traffic analysis, geographic breakdown, or abuse detection
- Security incident response: Pulling all IP addresses from a security alert, email header, or threat intelligence report to check against blocklists and reputation databases
- Firewall audit: Extracting IPs from exported firewall configurations to verify that rules reference valid, expected addresses
- Migration planning: Scanning configuration files, documentation, and scripts for hardcoded IP addresses that need to be updated when migrating to a new network
- Compliance checks: Identifying IP addresses in documents, chat logs, or support tickets that may indicate data exposure or unauthorized access
- Network inventory: Extracting IPs from discovery scan output, SNMP data, or ARP tables to build or update a network asset inventory
Try These Examples
A standard Apache combined log format entry. The extractor identifies 203.0.113.42 as the client IP address at the beginning of the line.
203.0.113.42 - frank [10/Oct/2024:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326 The strings 1.2.3.4 and 3.1.0.5 match the IPv4 regex pattern (four dot-separated numbers) but are actually software version numbers. This demonstrates why extracted IPs should always be validated.
Upgraded OpenSSL from version 1.2.3.4 to 3.1.0.5 on the production servers.