Skip to content

skip blank rows in load_csv, closes #29#46

Open
HrachShah wants to merge 1 commit into
simonw:mainfrom
HrachShah:fix/issue29-skip-empty-trailing-lines
Open

skip blank rows in load_csv, closes #29#46
HrachShah wants to merge 1 commit into
simonw:mainfrom
HrachShah:fix/issue29-skip-empty-trailing-lines

Conversation

@HrachShah

Copy link
Copy Markdown

Closes #29.

A file ending with \n (or any number of blank trailing lines) used to produce a dict entry that was missing every column, which then crashed with KeyError: 'a' (or whichever key column was passed) when the caller tried to look up the key. The error message gave no hint that the actual problem was a stray blank line at the end of the file.

csv.reader returns an empty list for a fully blank row (a record of just \n), so filtering those out before building the dicts is enough - blank rows in the middle of the file are skipped the same way. Whitespace-only and comma-only lines still parse as data rows, which preserves the existing behaviour for inputs where those carry meaning (e.g. a row of , is a row with two empty fields, not a blank row).

Reproduce

Before:

$ printf "a,b,c\n1,2,3\n\n" > /tmp/a.csv
$ csv-diff /tmp/a.csv /tmp/a.csv --key a
KeyError: 'a'

After:

$ csv-diff /tmp/a.csv /tmp/a.csv --key a
$ # exits 0, no output (the two files diff to themselves)

Tests added

tests/test_csv_diff.py:

  • test_trailing_blank_line_is_skipped: loads a CSV with a single trailing blank line and asserts the key lookup no longer raises and the loaded rows are correct.
  • test_multiple_blank_lines_and_interior_blank_skipped: loads a CSV with several trailing blank lines plus a blank line in the middle, asserts the rows dict is identical to the same data without the blank lines.
  • test_compare_with_trailing_blank_lines: runs compare() against two CSVs that both end in blank lines and asserts the diff result is equivalent to comparing the same data without the blank lines.

All 27 tests pass (24 existing + 3 new).

…KeyError

Closes simonw#29. A file ending with \n (or any number of blank trailing lines)
used to produce a dict entry that was missing every column, which then
crashed with KeyError when the caller tried to access the key column.
csv.reader returns an empty list for a fully blank row, so filtering
those out before building dicts is enough - blank rows in the middle of
the file are skipped the same way. Whitespace-only or comma-only lines
still parse as data, which preserves the existing behaviour for inputs
where those carry meaning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: new line at end of file causes crash

1 participant