CSV Validator & Linter

What is CSV Validation and Linting?

CSV validation checks that your comma-separated values file follows a consistent structure — the same number of columns in every row, no duplicate headers, no unclosed quotes. CSV linting goes further and surfaces warnings about data quality issues like empty rows, leading whitespace in headers, or inconsistent column counts that won't cause a parse error but will cause silent bugs in your import pipeline. Use this free online tool to catch these issues before loading data into a database, spreadsheet, or data pipeline.

RFC 4180: the CSV standard

RFC 4180 defines the most widely accepted CSV format: fields are separated by commas, records end with CRLF, and fields containing commas, double-quotes, or line breaks must be wrapped in double-quotes. A double-quote inside a quoted field is escaped by doubling it (""). The first record may optionally be a header row. Many real-world CSV files deviate from this spec — they use LF instead of CRLF, omit quoting, or use semicolons — which is why validation tools are essential before processing CSV programmatically.

Frequently Asked Questions

What is CSV linting and why does it matter?

CSV linting is the process of automatically checking a CSV file for structural problems and data quality issues beyond basic parse errors. Common issues include rows with too few or too many columns, duplicate header names, empty rows, and headers with accidental whitespace. These problems don't always cause an error at parse time — they cause wrong data to be silently inserted into the wrong columns, which is much harder to debug later.

What are the most common CSV errors?

The most frequent CSV errors are: inconsistent column counts — a row has more or fewer fields than the header; unclosed quotes — a double-quote opens a field but the closing quote is missing or misplaced; duplicate headers — two columns share the same name, causing one to silently overwrite the other during import; BOM characters — Excel adds a UTF-8 byte order mark that breaks the first header name; and mixed line endings — CRLF and LF mixed in the same file.

How do I fix rows with inconsistent column counts?

First identify which rows have the wrong count — this tool shows the exact line number and actual vs. expected column count. Common causes are: an unquoted comma inside a value (fix by wrapping the field in double-quotes), a missing trailing comma, or a row that was manually edited and had a field accidentally deleted. In PHP, str_getcsv() and fgetcsv() return arrays you can check with count(). In Python, csv.DictReader raises a warning for rows with unexpected fields.

What does the null rate percentage mean?

The null rate is the percentage of data rows (excluding the header) where that column is empty. A 0% null rate means every row has a value in that column. A 100% null rate means the column is entirely empty across all rows. High null rates can indicate optional fields, data export issues, or columns that were added to the schema but not yet populated. Use this metric to decide whether to add NOT NULL constraints when importing to a database.

How does type inference work in this CSV validator?

The tool inspects all non-empty values in each column and tries to classify the column as integer, decimal, date (ISO 8601 format), boolean (true/false), email, or string. A column is labelled with a type only if every non-empty value matches that type — if there is any mismatch, it falls back to "string". This helps you catch columns that should be numeric but contain an accidental text value, or date columns where one row has a differently formatted date.

Column Statistics