Log Parser

Parse and filter log files

Log input
1
2
Processed locally in your browser
Understanding Log Formats
TL;DR

Log parsing extracts structured data from log files. Common formats: Apache/Nginx access logs, syslog, JSON structured logs.

What is Log Parsing?

Log parsing is the process of extracting structured data from log files — transforming raw text lines into named fields (timestamp, IP address, status code, message) that can be searched, filtered, and analyzed.

Applications generate logs to record events: HTTP requests, errors, authentication attempts, database queries, system state changes. These logs are essential for debugging, monitoring, security auditing, and performance analysis. But raw log files are just lines of text — to make them useful, you need to parse them into structured data.

The challenge is that log formats vary widely. Apache, Nginx, syslog, and application-specific formats each use different layouts, delimiters, and date formats. A log parser must understand the expected format and extract each field by its position and pattern.

Common Log Formats

Apache/Nginx Common Log Format (CLF)

127.0.0.1 - frank [10/Oct/2024:13:55:36 -0700] "GET /page.html HTTP/1.1" 200 2326
FieldValueDescription
Remote host127.0.0.1Client IP address
Identity-RFC 1413 identity (rarely used)
UserfrankAuthenticated username (or -)
Timestamp[10/Oct/2024:13:55:36 -0700]Request time with timezone
Request"GET /page.html HTTP/1.1"Method, path, and protocol
Status200HTTP response status code
Size2326Response body size in bytes

The Combined Log Format adds two more fields: the referrer URL and the user agent string.

Syslog (RFC 5424)

<34>1 2024-10-10T13:55:36.123Z server1 myapp 1234 - - Application started

Syslog is the standard logging protocol for Unix/Linux systems. It includes a priority value (facility + severity), version, ISO 8601 timestamp, hostname, application name, process ID, message ID, structured data, and the message body.

JSON Structured Logs

{"timestamp":"2024-10-10T13:55:36.123Z","level":"INFO","service":"api","message":"Request processed","duration_ms":42,"status":200}

JSON logs are the modern standard. Every field is explicitly named, making parsing trivial — just JSON.parse() each line. No regex needed, no positional ambiguity, and new fields can be added without breaking existing parsers.

Regex vs Structured Logging

Traditional log formats (Apache, syslog) require regex-based parsing. You write a pattern that matches the expected format and captures each field into a named group. This works well for known formats but breaks when logs contain unexpected characters, multi-line messages, or format changes.

Structured logging (JSON, logfmt) eliminates regex entirely. Each log entry is a self-describing data structure. This approach is:

  • Easier to parse — Standard JSON/logfmt parsers handle everything
  • More robust — No regex maintenance when formats change
  • Better for search — Log aggregators can index individual fields
  • More extensible — Add fields without breaking existing consumers

The trade-off is readability. A JSON log line is harder to scan with the human eye than a well-formatted text line. Many teams use JSON logging in production (for machine consumption) and human-readable formatting in development.

Log Levels

Most logging frameworks use a hierarchy of severity levels:

LevelSeverityUse Case
TRACELowestFine-grained debugging (loop iterations, variable values)
DEBUGLowDiagnostic information for developers
INFONormalRoutine operational messages (startup, shutdown, requests)
WARNElevatedUnexpected but recoverable situations (deprecated API used, retry needed)
ERRORHighFailures that affect a single operation but not the whole system
FATALCriticalUnrecoverable failures that require immediate attention or system shutdown

In production, logs are typically set to INFO or WARN level to avoid the volume and performance overhead of DEBUG/TRACE output. Lower levels are enabled temporarily during incident investigation.

Common Use Cases

  • Incident investigation: When a service fails at 3 AM, logs are the first place engineers look to understand what happened, in what order, and why
  • Security auditing: Authentication logs, access logs, and firewall logs reveal unauthorized access attempts, brute-force attacks, and data exfiltration
  • Performance monitoring: Parsing request duration fields from access logs identifies slow endpoints and performance regressions
  • Compliance: Regulations like PCI-DSS and GDPR require audit trails — parsed logs provide searchable records of data access and system changes
  • Alerting: Log aggregators trigger alerts when error rates spike, specific patterns appear, or critical services go down

Try These Examples

Apache Combined Log Format Valid

A standard Apache combined log entry containing: remote IP, identity, user, timestamp, request method/path/protocol, status code, response size, referrer, and user agent. Each field follows a predictable pattern that can be parsed with regex.

192.168.1.1 - frank [10/Oct/2024:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326 "http://www.example.com/start.html" "Mozilla/5.0"
Malformed Log Entry Invalid

A truncated log line missing the closing bracket on the timestamp, the HTTP protocol version, and all subsequent fields. Log parsers will fail to extract structured data from malformed entries like this.

192.168.1.1 - - [10/Oct/2024:13:55:36 "GET /index.html