Why the CSV standard library is broken (and how to fix it), Part IV or Numerics a.k.a. Auto-Magic Type Inference for Strings and Numbers?

Gerald Bauer

2018-10-11 15:51:22 UTC

Hello,

I've written a new (and fourth) episode on why the CSV standard library is
broken, broken, broken (and how to fix it).

Let's have a look at numerics a.k.a. auto-magic type inference for
strings and numbers [1].

Here's the challenge for the standard csv library.
Let's read data.csv:

1,2,3
"4","5","6"

Using these popular two rules (bonus for NaNs - not a number).

Rule 1: Use "un-quoted" values for float numbers e.g. 1,2,3 or 1.0,
2.0, 3.0 etc.

Rule 2: Use quoted values for "non-numeric" strings e.g. "4", "5", "6"
or "Hello, World!" etc.

In the new csv reader it works like this :-):

records = Csv.numeric.read( 'data.csv' )
pp records
# => [[1.0, 2.0, 3.0],
# ["4", "5", "6"]]

And with your own not a number constants / configuration:

records = Csv.numeric.parse( '1,2,NAN,#NA', nan: ['NAN', '#NA'] )
pp records
# => [[1.0, 2.0, NaN, NaN]]

I disagree that it's broken.
It's implementing the [strict] RFC [CSV format] and gives you the tools that allow you to be less strict.

Anyone? Show us how you handle the reading of the numerics
variant and Not a Number (NaN) with the standard csv library?

Questions and comments welcome. Cheers. Prost.

PS: If you want to see other (more) CSV formats / dialects pre-configured
and supported "out-of-the-box" in the new csv reader, please tell.

[1] https://github.com/csv11/docs/blob/master/csv-numerics.md

Unsubscribe: <mailto:ruby-talk-***@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>