Gerald Bauer
2018-10-11 15:51:22 UTC
Hello,
I've written a new (and fourth) episode on why the CSV standard library is
broken, broken, broken (and how to fix it).
Let's have a look at numerics a.k.a. auto-magic type inference for
strings and numbers [1].
Here's the challenge for the standard csv library.
Let's read data.csv:
1,2,3
"4","5","6"
Using these popular two rules (bonus for NaNs - not a number).
Rule 1: Use "un-quoted" values for float numbers e.g. 1,2,3 or 1.0,
2.0, 3.0 etc.
Rule 2: Use quoted values for "non-numeric" strings e.g. "4", "5", "6"
or "Hello, World!" etc.
In the new csv reader it works like this :-):
records = Csv.numeric.read( 'data.csv' )
pp records
# => [[1.0, 2.0, 3.0],
# ["4", "5", "6"]]
And with your own not a number constants / configuration:
records = Csv.numeric.parse( '1,2,NAN,#NA', nan: ['NAN', '#NA'] )
pp records
# => [[1.0, 2.0, NaN, NaN]]
variant and Not a Number (NaN) with the standard csv library?
Questions and comments welcome. Cheers. Prost.
PS: If you want to see other (more) CSV formats / dialects pre-configured
and supported "out-of-the-box" in the new csv reader, please tell.
[1] https://github.com/csv11/docs/blob/master/csv-numerics.md
Unsubscribe: <mailto:ruby-talk-***@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
I've written a new (and fourth) episode on why the CSV standard library is
broken, broken, broken (and how to fix it).
Let's have a look at numerics a.k.a. auto-magic type inference for
strings and numbers [1].
Here's the challenge for the standard csv library.
Let's read data.csv:
1,2,3
"4","5","6"
Using these popular two rules (bonus for NaNs - not a number).
Rule 1: Use "un-quoted" values for float numbers e.g. 1,2,3 or 1.0,
2.0, 3.0 etc.
Rule 2: Use quoted values for "non-numeric" strings e.g. "4", "5", "6"
or "Hello, World!" etc.
In the new csv reader it works like this :-):
records = Csv.numeric.read( 'data.csv' )
pp records
# => [[1.0, 2.0, 3.0],
# ["4", "5", "6"]]
And with your own not a number constants / configuration:
records = Csv.numeric.parse( '1,2,NAN,#NA', nan: ['NAN', '#NA'] )
pp records
# => [[1.0, 2.0, NaN, NaN]]
I disagree that it's broken.
It's implementing the [strict] RFC [CSV format] and gives you the tools that allow you to be less strict.
Anyone? Show us how you handle the reading of the numericsIt's implementing the [strict] RFC [CSV format] and gives you the tools that allow you to be less strict.
variant and Not a Number (NaN) with the standard csv library?
Questions and comments welcome. Cheers. Prost.
PS: If you want to see other (more) CSV formats / dialects pre-configured
and supported "out-of-the-box" in the new csv reader, please tell.
[1] https://github.com/csv11/docs/blob/master/csv-numerics.md
Unsubscribe: <mailto:ruby-talk-***@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>