Text Cleaner & Formatter

What is a Text Cleaner?

A text cleaner removes unwanted formatting, characters, and structure from raw text so you can use it in your own context cleanly. Whether you're stripping HTML tags from scraped content, normalizing line endings between Windows and Unix files, collapsing extra whitespace from a copy-paste, or straightening curly quotes before inserting into a database, a text cleaner handles the tedious transformations in one click. This free online tool processes everything locally in your browser—no text is ever sent to a server.

When to use each option

Trim lines is useful after copy-pasting from PDFs or spreadsheets that add trailing spaces. Collapse spaces fixes double-spaced sentences from word processors. Remove blank lines condenses content heavy with empty rows. Deduplicate lines is handy for log files, keyword lists, or CSV exports with repeated rows. Normalize line endings resolves the classic CRLF issue when sharing files between Windows and macOS/Linux systems.

Frequently Asked Questions

What are smart quotes and why do they cause problems?

Smart quotes (also called curly or typographic quotes) are the stylistically curved quotation marks Word processors and design tools automatically insert: ‘ ’ “ ”. They're Unicode characters outside the basic ASCII range. When pasted into code, config files, JSON, SQL queries, or terminal commands, they break syntax because parsers expect straight ASCII quotes (' "). Use the "Straighten smart quotes" option to convert them back to safe ASCII equivalents.

Why should I normalize line endings?

Windows uses CRLF (\r\n, two characters) to end lines, while macOS and Linux use LF (\n, one character). Files edited on one system and opened on another can appear with extra blank lines, garbled text, or diff noise in version control. Normalizing to LF is the convention for most open-source projects and Unix-style tooling. Git's core.autocrlf setting handles this automatically in repos, but manual text processing often requires explicit normalization.

What are HTML entities and when do I need to decode them?

HTML entities are escape sequences used to represent characters that have special meaning in HTML or that fall outside safe ASCII. For example, & represents &, < represents <, and   is a non-breaking space. When scraping HTML or exporting content from a CMS, you often receive entity-encoded text that needs decoding before further processing, inserting into a database, or feeding into an API that doesn't understand HTML encoding.

Is it safe to use Strip HTML on untrusted input?

This tool uses DOMPurify under the hood with all tags and attributes disallowed, which is one of the most robust HTML sanitization libraries available. It correctly handles malformed HTML, nested tags, and XSS injection attempts. However, for server-side sanitization of user-generated content in Laravel, always use a server-side library too—strip_tags() for basic stripping or the league/html-to-markdown package if you need structure preserved. Client-side sanitization is for convenience, not security.

What counts as punctuation when using Remove Punctuation?

The remove punctuation option strips any character that is not a word character (\w: letters, digits, underscore) or whitespace. This includes periods, commas, exclamation marks, question marks, hyphens, parentheses, brackets, slashes, and most symbols. It's useful for preparing text for natural language processing, word frequency analysis, or keyword extraction where punctuation would pollute the token list. Note that if you also enable Remove Numbers, digits are stripped separately in a later pass.