Text Comparator

Compare two texts side by side

Text A
1
Text B
1
Understanding Text Comparison (Diff)
TL;DR

Text diff highlights exact differences between two text versions, line by line. Same algorithm behind git diff and code reviews.

What is Diff?

A diff (short for “difference”) is an algorithm that compares two texts and identifies the exact changes between them. The output shows which lines were added, removed, or modified, making it easy to understand what changed between two versions of a document.

Diff is one of the most fundamental tools in software engineering. Every code review, every git diff, every “track changes” feature in a document editor relies on diff algorithms. The concept was pioneered by Douglas McIlroy at Bell Labs, and the first Unix diff utility was published in 1974.

The power of diff lies in its precision. Instead of vaguely telling you “these files are different,” it shows you exactly which lines changed, in what way, and in what order. This makes diff essential for collaborative work where multiple people modify the same documents.

The LCS Algorithm

Most diff implementations are based on the Longest Common Subsequence (LCS) algorithm. LCS finds the longest sequence of lines that appear in both texts in the same order (but not necessarily consecutively). Everything not in the LCS is a difference.

The classic algorithm works as follows:

  1. Build a matrix where rows represent lines in text A and columns represent lines in text B
  2. Fill the matrix using dynamic programming — each cell records the length of the LCS up to that point
  3. Backtrack through the matrix to identify which lines are common (unchanged) and which are unique to each text (additions or deletions)

The result is a minimal set of changes that transforms text A into text B. “Minimal” means the algorithm finds the fewest possible additions and deletions — it won’t report a deletion followed by an addition when a single modification would suffice.

The time complexity is O(n*m) where n and m are the number of lines in each text. For very large files, optimized algorithms like Myers’ diff (used by Git) reduce this to O((n+m)*d) where d is the number of differences, making it much faster when the texts are mostly similar.

Output Formats

Diff results can be displayed in several formats:

Unified diff (most common) shows changes with context lines. Added lines are prefixed with +, removed lines with -, and unchanged context lines have no prefix. This is the format used by git diff:

 port=8080
-host=localhost
-debug=false
+host=0.0.0.0
+debug=true
+log_level=info

Side-by-side diff shows both texts in parallel columns with changes highlighted. This format is used by many GUI diff tools and code review interfaces because it is easier to visually scan.

Inline diff highlights character-level differences within changed lines. Instead of showing the entire old and new line, it marks exactly which characters changed — useful for long lines where only a small portion differs.

Common Use Cases

  • Code review: Reviewing pull requests and merge requests by examining exactly what changed in each file, line by line
  • Configuration auditing: Comparing production config files against staging or baseline configs to identify unauthorized changes
  • Document versioning: Tracking changes between document revisions — contracts, specifications, policies — without relying on Word’s track changes
  • Database migration verification: Comparing schema dumps before and after migration to confirm only intended changes were applied
  • Debugging regressions: Diffing logs, API responses, or test outputs between a working version and a broken version to isolate the change that caused the regression

Try These Examples

Two Config File Versions Valid

The diff shows: 'port=8080' unchanged, 'host' changed from localhost to 0.0.0.0, 'debug' changed from false to true, and 'log_level=info' added. Two modifications and one addition.

Version A: 'port=8080\nhost=localhost\ndebug=false' vs Version B: 'port=8080\nhost=0.0.0.0\ndebug=true\nlog_level=info'
Identical Texts Valid

When both texts are identical, the diff produces no output — no additions, no deletions, no modifications. This confirms the two versions are exactly the same.

Both inputs: 'The quick brown fox jumps over the lazy dog.'