commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Crum <adrian.c...@sandglass-software.com>
Subject Re: [CSV] Proposed fix for CSV-35 (Was: Fwd: [jira] [Comment Edited] (CSV-35) Escaped line separators are not supported)
Date Thu, 10 Jul 2014 13:36:04 GMT
I agree that we should stop worrying about edge cases and release a 
version that covers the majority of needs.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 7/10/2014 9:12 AM, Benedikt Ritter wrote:
> 2014-07-09 4:15 GMT+02:00 Gary Gregory <garydgregory@gmail.com>:
>
>> We do have a discrepancy between our format class and lexer (which is
>> hardwired with CR & LF).
>>
>> Ideally, it seems the lexer should pickup it's set of EOL Strings from the
>> format.
>>
>> I recall reading worries of performance issues changing this but either we
>> support all of the EOL strings including some of the odd ball ones like
>> Unicode, or we do not. Perhaps we can have an alternate Lexer that takes a
>> set of EOL strings if performance is really that much worse.
>>
>
> Sounds reasonable, but seems to be a lot of work. Maybe we can just
> document that 1.0 can only handle CR & LF and add the ability for more
> exotic record separators in 1.1. I'm hoping for higher adoption and more
> patches once we have a release on maven central.
>
> Benedikt
>
>
>>
>> Gary
>>
>>
>> On Mon, Jul 7, 2014 at 1:47 PM, Benedikt Ritter <britter@apache.org>
>> wrote:
>>
>>> Any thoughts about this fix? Could be a solution to push out 1.0. If we
>>> come up with a more generic solution afterwards, we can still deprecate
>>> escapeCRLFOnce.
>>>
>>> Benedikt
>>>
>>> ---------- Forwarded message ----------
>>> From: Tillmann Gaida (JIRA) <jira@apache.org>
>>> Date: 2014-06-30 10:36 GMT+02:00
>>> Subject: [jira] [Comment Edited] (CSV-35) Escaped line separators are not
>>> supported
>>> To: britter@apache.org
>>>
>>>
>>>
>>>      [
>>>
>>>
>> https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047460#comment-14047460
>>> ]
>>>
>>> Tillmann Gaida edited comment on CSV-35 at 6/30/14 8:34 AM:
>>> ------------------------------------------------------------
>>>
>>> I added a patch "commons-csv CSV-35 escapeCRLFOnce[ test].patch", which
>>> introduces a CSVFormat setting "escapeCRLFOnce", which enables the
>> desired
>>> behaviour in Lexer. It is false by default and I did not change
>>> CSVFormat.MYSQL, which might be approprate. I am not exactly happy with
>> the
>>> naming of the setting. Consider renaming it if you happen to build upon
>> the
>>> patch.
>>>
>>> EDIT: clarity
>>>
>>> EDIT: This is a very specific setting. A cleaner solution would probably
>> be
>>> to allow escaping of record separators by a single escape char. However
>> it
>>> appears that the MYSQL format uses LF as a record separator, so we would
>>> need to have multiple record separators, which in this case would not be
>>> actual record separators.
>>>
>>> I'd argue that CRLF is special enough to have an individual setting, but
>> I
>>> would also agree with having a cleaner CSVFormat. The only real
>> alternative
>>> would be having a way to individually specify character sequences and a
>>> replacement if they are preceded by the escape char.
>>>
>>>
>>> was (Author: tillmann gaida):
>>> I added a patch "commons-csv CSV-35 escapeCRLFOnce[ test].patch", which
>>> introduces a CSVFormat setting "escapeCRLFOnce", which enables the
>> desired
>>> behaviour in Lexer. It is false by default and I did not change
>>> CSVFormat.MYSQL, which might be approprate. I am not exactly happy with
>> the
>>> naming of the setting. Consider renaming it if you happen to build upon
>> the
>>> patch.
>>>
>>> EDIT: clarity
>>>
>>>> Escaped line separators are not supported
>>>> -----------------------------------------
>>>>
>>>>                  Key: CSV-35
>>>>                  URL: https://issues.apache.org/jira/browse/CSV-35
>>>>              Project: Commons CSV
>>>>           Issue Type: Bug
>>>>             Reporter: Emmanuel Bourg
>>>>              Fix For: 1.0
>>>>
>>>>          Attachments: CSV-35.patch, commons-csv CSV-35 escapeCRLFOnce
>>> test.patch, commons-csv CSV-35 escapeCRLFOnce.patch,
>>> mysql-export-line-terminated-by-crlf.csv,
>>> mysql-export-line-terminated-by-lf.csv
>>>>
>>>>
>>>> Commons CSV doesn't handle escaped line separators, for example:
>>>> {code}
>>>> value1;value2;value3a\
>>>> value3b
>>>> {code}
>>>> In this case the expected result is:
>>>> {code}["value1", "value2", "value3a\nvalue3b"]{code}
>>>> This kind of escaping is produced by MySQL, whether the field enclosing
>>> is enabled or not. It's possible to see enclosing quotes and escaped line
>>> separators like this:
>>>> {code}
>>>> "value1";"value2";"value3a\
>>>> value3b"
>>>> {code}
>>>
>>>
>>>
>>> --
>>> This message was sent by Atlassian JIRA
>>> (v6.2#6252)
>>>
>>>
>>>
>>> --
>>> http://people.apache.org/~britter/
>>> http://www.systemoutprintln.de/
>>> http://twitter.com/BenediktRitter
>>> http://github.com/britter
>>>
>>
>>
>>
>> --
>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>> Java Persistence with Hibernate, Second Edition
>> <http://www.manning.com/bauer3/>
>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>> Spring Batch in Action <http://www.manning.com/templier/>
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message