Log Parser
Parse and filter log files
Log parsing extracts structured data from log files. Common formats: Apache/Nginx access logs, syslog, JSON structured logs.
What is Log Parsing?
Log parsing is the process of extracting structured data from log files — transforming raw text lines into named fields (timestamp, IP address, status code, message) that can be searched, filtered, and analyzed.
Applications generate logs to record events: HTTP requests, errors, authentication attempts, database queries, system state changes. These logs are essential for debugging, monitoring, security auditing, and performance analysis. But raw log files are just lines of text — to make them useful, you need to parse them into structured data.
The challenge is that log formats vary widely. Apache, Nginx, syslog, and application-specific formats each use different layouts, delimiters, and date formats. A log parser must understand the expected format and extract each field by its position and pattern.
Common Log Formats
Apache/Nginx Common Log Format (CLF)
127.0.0.1 - frank [10/Oct/2024:13:55:36 -0700] "GET /page.html HTTP/1.1" 200 2326
| Field | Value | Description |
|---|---|---|
| Remote host | 127.0.0.1 | Client IP address |
| Identity | - | RFC 1413 identity (rarely used) |
| User | frank | Authenticated username (or -) |
| Timestamp | [10/Oct/2024:13:55:36 -0700] | Request time with timezone |
| Request | "GET /page.html HTTP/1.1" | Method, path, and protocol |
| Status | 200 | HTTP response status code |
| Size | 2326 | Response body size in bytes |
The Combined Log Format adds two more fields: the referrer URL and the user agent string.
Syslog (RFC 5424)
<34>1 2024-10-10T13:55:36.123Z server1 myapp 1234 - - Application started
Syslog is the standard logging protocol for Unix/Linux systems. It includes a priority value (facility + severity), version, ISO 8601 timestamp, hostname, application name, process ID, message ID, structured data, and the message body.
JSON Structured Logs
{"timestamp":"2024-10-10T13:55:36.123Z","level":"INFO","service":"api","message":"Request processed","duration_ms":42,"status":200}
JSON logs are the modern standard. Every field is explicitly named, making parsing trivial — just JSON.parse() each line. No regex needed, no positional ambiguity, and new fields can be added without breaking existing parsers.
Regex vs Structured Logging
Traditional log formats (Apache, syslog) require regex-based parsing. You write a pattern that matches the expected format and captures each field into a named group. This works well for known formats but breaks when logs contain unexpected characters, multi-line messages, or format changes.
Structured logging (JSON, logfmt) eliminates regex entirely. Each log entry is a self-describing data structure. This approach is:
- Easier to parse — Standard JSON/logfmt parsers handle everything
- More robust — No regex maintenance when formats change
- Better for search — Log aggregators can index individual fields
- More extensible — Add fields without breaking existing consumers
The trade-off is readability. A JSON log line is harder to scan with the human eye than a well-formatted text line. Many teams use JSON logging in production (for machine consumption) and human-readable formatting in development.
Log Levels
Most logging frameworks use a hierarchy of severity levels:
| Level | Severity | Use Case |
|---|---|---|
| TRACE | Lowest | Fine-grained debugging (loop iterations, variable values) |
| DEBUG | Low | Diagnostic information for developers |
| INFO | Normal | Routine operational messages (startup, shutdown, requests) |
| WARN | Elevated | Unexpected but recoverable situations (deprecated API used, retry needed) |
| ERROR | High | Failures that affect a single operation but not the whole system |
| FATAL | Critical | Unrecoverable failures that require immediate attention or system shutdown |
In production, logs are typically set to INFO or WARN level to avoid the volume and performance overhead of DEBUG/TRACE output. Lower levels are enabled temporarily during incident investigation.
Common Use Cases
- Incident investigation: When a service fails at 3 AM, logs are the first place engineers look to understand what happened, in what order, and why
- Security auditing: Authentication logs, access logs, and firewall logs reveal unauthorized access attempts, brute-force attacks, and data exfiltration
- Performance monitoring: Parsing request duration fields from access logs identifies slow endpoints and performance regressions
- Compliance: Regulations like PCI-DSS and GDPR require audit trails — parsed logs provide searchable records of data access and system changes
- Alerting: Log aggregators trigger alerts when error rates spike, specific patterns appear, or critical services go down
Try These Examples
A standard Apache combined log entry containing: remote IP, identity, user, timestamp, request method/path/protocol, status code, response size, referrer, and user agent. Each field follows a predictable pattern that can be parsed with regex.
192.168.1.1 - frank [10/Oct/2024:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326 "http://www.example.com/start.html" "Mozilla/5.0" A truncated log line missing the closing bracket on the timestamp, the HTTP protocol version, and all subsequent fields. Log parsers will fail to extract structured data from malformed entries like this.
192.168.1.1 - - [10/Oct/2024:13:55:36 "GET /index.html