commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sebb <seb...@gmail.com>
Subject Re: [CSV] Inconsistent record separator behavior
Date Wed, 22 Aug 2018 23:23:06 GMT
On 23 August 2018 at 00:01, Bruno P. Kinoshita
<brunodepaulak@yahoo.com.br.invalid> wrote:
>
>>Maybe I'm just not getting it, but it feels pretty messed up :-)
>
>
> Mutual feeling, and +1 for consistency. From what I understood, users should be able
to parse these crazy CVS's, but if they tried to re-create them, with comments, then they
wouldn't be able to avoid the println/newline (so it wouldn't be parseable later with the
same reader).
>
>
> We probably need a ticket for it to aggregate the discussion and maybe a possible solution.

I'm wondering whether we need to be as flexible when *creating* the CSV files.

"Be liberal in what you accept, and conservative in what you send" (Jon Postel)

In this case send == create, as it might be sent to other less liberal readers.

I don't have a problem with the output being less flexible, so long as
it is sufficiently flexible (which I think it likely is already).

I don't think consistency is necessary - or even desirable - here.

> Cheers
>
> ________________________________
> From: Benedikt Ritter <britter@apache.org>
> To: Commons Developers List <dev@commons.apache.org>; brunodepaulak@yahoo.com.br
> Sent: Thursday, 23 August 2018 7:10 AM
> Subject: Re: [CSV] Inconsistent record separator behavior
>
>
>
> Hi Bruno,
>
> Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
> <brunodepaulak@yahoo.com.br.invalid>:
>
>> Hi,
>>
>>
>> Will try to look at the code and give a better answer during the weekend.
>> But risking a silly question, would it mean that users are not able to
>> parse a CSV unless each CSV row is separated by LF or CRLF?
>
>
> Yes.
>
>
>> I remember getting a CSV in a government website some time ago that was
>> formatted in a very strange way, and if I remember well it was a small
>> file, but without LF or CRLF. I think it was using | to separate the rows,
>> and , for columns.
>>
>
> I didn't know that there are formats that don't use a new line as line
> separator.
>
>
>>
>>
>> Quick search returned at least another person with similar issue
>> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
>>
>>
>> Not sure if I understood the problem well, but in case it makes sense...
>> my suggestion would be to perhaps confirm if we could change
>> CSVPrinter.printComment to accept other characters for line ending?
>>
>
> The inconsistency I'm seeing is, that we an the one hand accept any
> character sequence as a record separator. Comments in a way a like special
> records to me. But our implementation seems to put them on a new "line"
> using the println() method. The println() method in turn uses the record
> seperator to start a new record. So it's not necessarily a new line.
> Nevertheless while processing a comment, we look out for CR and LF and then
> we call println() again. Maybe I'm just not getting it, but it feels pretty
> messed up :-)
>
> Regards,
> Benedikt
>
>
>
>>
>>
>> Thanks!
>>
>> Bruno
>>
>>
>> ________________________________
>> From: Benedikt Ritter <britter@apache.org>
>> To: Commons Developers List <dev@commons.apache.org>
>> Sent: Tuesday, 21 August 2018 7:13 PM
>> Subject: [CSV] Inconsistent record separator behavior
>>
>>
>>
>> Hi,
>>
>>
>> we have this strange handling of record separator / line endings in CSV:
>>
>>
>> Users can use what ever character sequence they like as a record separator.
>>
>> I could for example use the ! character to mark the end of a record.
>>
>> Then we have CSVPrinter.printComment(String). This inserts comments into a
>>
>> CSV output. It detects CRLF and call println() on the CSVFormat, which in
>>
>> turn uses the record separator to indicate a new record...
>>
>>
>> So now I'm thinking: Does it make sense to use anything else but LF or CRLF
>>
>> as record separator? Maybe we should deprecate
>>
>> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
>>
>> users can choose between LF and CRLF. This way we can make the behavior
>>
>> between parsing and printing consistent.
>>
>>
>> Thoughts?
>>
>> Benedikt
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message