CSV file format

Question

CSV file format

Gavin Howard

Why haven't you mastered you CSV yet? It's the UNIX way after all.

March 19, 2017 - 06:04

Other urls found in this thread:

cat-v.org/
tools.ietf.org/html/rfc4180
twitter.com/NSFWRedditVideo

Ryan James

It takes three seconds to master CSV. Why is this even a thread?

March 19, 2017 - 06:29

Jose Brown

If you think you've mastered CSV in 3 second you don't understand it at all. Please repair your ignorance: cat-v.org/

March 19, 2017 - 06:30

Henry Barnes

I really don't understand what there is to CSV except commas and values. It's a fucking table that uses commas.

March 19, 2017 - 06:39

Nathaniel Kelly

"it takes a genius to understand it's simplicity"

March 19, 2017 - 06:42

Anthony Cooper

That's exactly why it takes 3 seconds to master it. It's very simple.

March 19, 2017 - 06:47

Gabriel Hall

I learned more productive things, like how to proofread. Bitch.

March 19, 2017 - 13:27

Mason Ross

UNIX way is p shit tbh

March 19, 2017 - 13:37

Luis Anderson

tools.ietf.org/html/rfc4180

March 19, 2017 - 14:23

Nolan Williams

You're not saying anything. You're fucking retarded.

March 19, 2017 - 14:39

Camden James

Take your meme back to /g/ where you tried to force it first.

March 19, 2017 - 15:18

Carter Taylor

OP, I never read such a quote about CSV from Ritchie. I searched and found nothing about it, point the source.
The one regarding Unix simplicity is true, though.

Now, CSV is a fine format, I've been using for quite some time and it's really easy to parse even with simple tools like 'cut'. As long there's no comma in the value itself.

Thanks for that RFC, I didn't know about it.

March 19, 2017 - 15:32

James Murphy

is this a meme or legit?

March 19, 2017 - 16:35

Bentley Perez

...

March 19, 2017 - 16:47

Carter Wood

he's right though, if you have commas in your CSV file you are fucked

March 19, 2017 - 16:51

Asher Barnes

this is why it takes a genuis to understand its simplicity apparently

March 19, 2017 - 17:00

Alexander Robinson

Because strangely most people are incapable of doing so. Do you have any idea how fucked up the majority of CSV files are? I downloaded 600GB of dox and oh god let me tell you. You have no idea how many rows there are with variable numbers of columns, or files where there is a different delimiter for each row, or data cells which contain the delimiter, but for which the delimiter isnt escaped and the cell isnt encapsulated in a character to identify it as a string. For fucks sakes I have files where the row delimiter is CRLF for 99% of the rows and for others it will only be a CR or a LF.

You're actually supposed to enclosure the cell is a string identifier such as quotes (or whatever character(s) you want)

March 19, 2017 - 19:58

Angel Gray

and for the other 1%

March 19, 2017 - 19:59

John Scott

I got a csv file with dates in 21-Mar-16 format. is this normal? I want to kill myself.

March 19, 2017 - 22:01

John Howard

Yes and if you import it in to SQL Server it is easy to change. You just define it as a date column and then use the CONVERT aggregate to change it to any format you want.

March 19, 2017 - 22:37

Jordan Miller

Why did people go with CSV instead of tab delimited? Tabs look prettier in plaintext editors, and also occur less frequently so you don't need to worry as much about escaping.

March 19, 2017 - 22:45

Noah Price

Because Excel? SQL Server defaults to tabs.

March 19, 2017 - 22:48

Connor Evans

Two-digit years are sub-standard but as long as they're consistent it should be possible to restructure them.

That's just moving the problem from escaping commas to escaping quotation marks. Also does CSV ignore whitespace before/after a comma? It'd be a pain to structure for readability without that.

Because when a particular column has nothing, consecutive tabs, or tabs at the end of a line are not readable. Tabs also space irregularly with variable column widths.

March 19, 2017 - 22:55

Connor Perry

Garbage in, garbage out. If the data isn't structured correctly, then your options are
1) dismiss your data and do nothing
2) deal with it by studying the data piece by piece

March 19, 2017 - 23:00

Evan Davis

I always use tabs in my own pet projects for flat data files. Can't be bothered to worry about escaping.

It's readable enough for me in vim. I just keep this
highlight ExtraWhitespace ctermbg=4 match ExtraWhitespace /\s\+$/
in .vimrc to see white space at the end of a line, and use :set list when I want to see the actual tab characters.

March 20, 2017 - 09:52

Ian Phillips

Most people will use a simple text editor on CSV. Even though some of them let you display trailing whitespace (nano does it too) it's still dangerous.

Also another thing I don't like about CSV is that you cannot comment it. For what I've been learning (surveying) it's actually helpful to be able to, as a way of removing bad measurements without deleting them (it's bad practice to), or just to annotate data files. The usual output format allows this but CSV is occasionally used for specific types of data, like control points.

March 20, 2017 - 19:46

Hudson Flores

All you need in that case is a boolean column that says if the data is valid or should be discarded.

March 20, 2017 - 19:51

Austin Jones

You're actually supposed to use characters which wont exist inside the data cell. Not go escaping everything.

Depends on the program but SQL Server Integration Services wont ignore it, you'd have to set the delimiter to be comma + space.

March 20, 2017 - 19:53

Ethan Flores

That requires special handling by the tool. You could do some sort of pre-process script before feeding it to CAD but that's annoying.

That's the idea. Which is why it might've been a better idea to use something else for such a generalized format.

Comma+whitespace (if any) I hope. Because indentation is why you'd even do it, so it's easier to read without needing a specialized tool to.

March 20, 2017 - 20:02

Julian Martinez

Still, maybe | would have been a better choice.

XML has the speed of CSV with the readability of binary
Any thoughts on this?

March 21, 2017 - 03:56

Tyler Rivera

You should fix your tool then, if the procedures say some entries have to be kept but not processed.

March 21, 2017 - 04:13

Alexander Myers

The script would be like 10 lines. Drop all marked columns, generate new file in "preprocessed" directory. You can easily do a batch script too. Collect all your data, preprocess once, then only use your tool on the filtered datasets.

I don't know why people complain about XML. It's mediocre not terrible. At least the tags make it easy to parse, I never had issues with XML parsers and at least it's very clear what the data is. There are also nice schema visualizers. It's not that hard to read either, especially if your editor isn't shit and can fold/beautify.

Of course it's space-inefficient due to being verbose, but if you have such large data that it matters you can just compress it.

Although all else being equal it seems much better to just use something like JSON instead of XML. It's more readable in a text editor and doesn't use as much space, and it's also more straightforward.

Also yes, I think a sensible designer would pick the lowest frequency character that renders properly and is easy to type. Seems like the CSV authors just picked the first thing that comes to mind without giving it a modicum of thought. Probably made sense to them because who would store anything except numbers and the odd alphanumeric-only string anyway, right?

March 21, 2017 - 15:39

Samuel Hernandez

...

March 21, 2017 - 15:53

Jason Davis

Fuck off. XML is the ugliest fucking piece of shit.

It's horrendous, not terrible.

I made a JSON parser in C that's less than 500 lines long and caused little challenge thanks to this golden tool no Pajeet YouTube video ever teaches called unions. JSON and CSV are enough for eveything, just as easy (if not easier) to parse than JSON and lighter.

With all these unreadable tags, you don't know where the data is.

You never tried JSON.

March 21, 2017 - 15:58

Jayden Cooper

You're not supposed to read or write XML data by hand. It's designed to be human readable if there is ever a need to do so. It's just as easy to make things unreadable in JSON.

March 21, 2017 - 18:47

Caleb Phillips

Fuckers send me unescaped commas and it's a mess to process that shit. What a great format, fuck you Ritchie.

March 21, 2017 - 19:48

Jaxon Jones

XML and JSON are both good in different respects, and both have their own short-comings, but none of them are inherently better or worse than each other because that depends entirely on what you're trying to use them for.

Having said that, I don't like XML for simple API responses or configuration files because it's almost always overkill.

March 21, 2017 - 20:52

Justin Clark

I'm more pointing out a weakness in CSV, that is it doesn't have a lot of provisions for readability (commenting, indentation may or may not work depending on how strings need processing). No issue when used as an info pipe between two programs but not as something designed with hand-editing in mind.

Bottom line is comma is too common.

XML is its own mess of reserved characters and escaping of when they occur. No thank you.

March 21, 2017 - 22:50

1 2 ... 4 Next

CSV file format

Last threads