commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <garydgreg...@gmail.com>
Subject Re: [CSV] Proposed fix for CSV-35 (Was: Fwd: [jira] [Comment Edited] (CSV-35) Escaped line separators are not supported)
Date Thu, 10 Jul 2014 14:38:37 GMT
I think we can document what we have now and what our possible road map is.

Let's cut an RC and see what happens.

Are you up for RM'ing?

Gary


On Thu, Jul 10, 2014 at 4:12 AM, Benedikt Ritter <britter@apache.org> wrote:

> 2014-07-09 4:15 GMT+02:00 Gary Gregory <garydgregory@gmail.com>:
>
> > We do have a discrepancy between our format class and lexer (which is
> > hardwired with CR & LF).
> >
> > Ideally, it seems the lexer should pickup it's set of EOL Strings from
> the
> > format.
> >
> > I recall reading worries of performance issues changing this but either
> we
> > support all of the EOL strings including some of the odd ball ones like
> > Unicode, or we do not. Perhaps we can have an alternate Lexer that takes
> a
> > set of EOL strings if performance is really that much worse.
> >
>
> Sounds reasonable, but seems to be a lot of work. Maybe we can just
> document that 1.0 can only handle CR & LF and add the ability for more
> exotic record separators in 1.1. I'm hoping for higher adoption and more
> patches once we have a release on maven central.
>
> Benedikt
>
>
> >
> > Gary
> >
> >
> > On Mon, Jul 7, 2014 at 1:47 PM, Benedikt Ritter <britter@apache.org>
> > wrote:
> >
> > > Any thoughts about this fix? Could be a solution to push out 1.0. If we
> > > come up with a more generic solution afterwards, we can still deprecate
> > > escapeCRLFOnce.
> > >
> > > Benedikt
> > >
> > > ---------- Forwarded message ----------
> > > From: Tillmann Gaida (JIRA) <jira@apache.org>
> > > Date: 2014-06-30 10:36 GMT+02:00
> > > Subject: [jira] [Comment Edited] (CSV-35) Escaped line separators are
> not
> > > supported
> > > To: britter@apache.org
> > >
> > >
> > >
> > >     [
> > >
> > >
> >
> https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047460#comment-14047460
> > > ]
> > >
> > > Tillmann Gaida edited comment on CSV-35 at 6/30/14 8:34 AM:
> > > ------------------------------------------------------------
> > >
> > > I added a patch "commons-csv CSV-35 escapeCRLFOnce[ test].patch", which
> > > introduces a CSVFormat setting "escapeCRLFOnce", which enables the
> > desired
> > > behaviour in Lexer. It is false by default and I did not change
> > > CSVFormat.MYSQL, which might be approprate. I am not exactly happy with
> > the
> > > naming of the setting. Consider renaming it if you happen to build upon
> > the
> > > patch.
> > >
> > > EDIT: clarity
> > >
> > > EDIT: This is a very specific setting. A cleaner solution would
> probably
> > be
> > > to allow escaping of record separators by a single escape char. However
> > it
> > > appears that the MYSQL format uses LF as a record separator, so we
> would
> > > need to have multiple record separators, which in this case would not
> be
> > > actual record separators.
> > >
> > > I'd argue that CRLF is special enough to have an individual setting,
> but
> > I
> > > would also agree with having a cleaner CSVFormat. The only real
> > alternative
> > > would be having a way to individually specify character sequences and a
> > > replacement if they are preceded by the escape char.
> > >
> > >
> > > was (Author: tillmann gaida):
> > > I added a patch "commons-csv CSV-35 escapeCRLFOnce[ test].patch", which
> > > introduces a CSVFormat setting "escapeCRLFOnce", which enables the
> > desired
> > > behaviour in Lexer. It is false by default and I did not change
> > > CSVFormat.MYSQL, which might be approprate. I am not exactly happy with
> > the
> > > naming of the setting. Consider renaming it if you happen to build upon
> > the
> > > patch.
> > >
> > > EDIT: clarity
> > >
> > > > Escaped line separators are not supported
> > > > -----------------------------------------
> > > >
> > > >                 Key: CSV-35
> > > >                 URL: https://issues.apache.org/jira/browse/CSV-35
> > > >             Project: Commons CSV
> > > >          Issue Type: Bug
> > > >            Reporter: Emmanuel Bourg
> > > >             Fix For: 1.0
> > > >
> > > >         Attachments: CSV-35.patch, commons-csv CSV-35 escapeCRLFOnce
> > > test.patch, commons-csv CSV-35 escapeCRLFOnce.patch,
> > > mysql-export-line-terminated-by-crlf.csv,
> > > mysql-export-line-terminated-by-lf.csv
> > > >
> > > >
> > > > Commons CSV doesn't handle escaped line separators, for example:
> > > > {code}
> > > > value1;value2;value3a\
> > > > value3b
> > > > {code}
> > > > In this case the expected result is:
> > > > {code}["value1", "value2", "value3a\nvalue3b"]{code}
> > > > This kind of escaping is produced by MySQL, whether the field
> enclosing
> > > is enabled or not. It's possible to see enclosing quotes and escaped
> line
> > > separators like this:
> > > > {code}
> > > > "value1";"value2";"value3a\
> > > > value3b"
> > > > {code}
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian JIRA
> > > (v6.2#6252)
> > >
> > >
> > >
> > > --
> > > http://people.apache.org/~britter/
> > > http://www.systemoutprintln.de/
> > > http://twitter.com/BenediktRitter
> > > http://github.com/britter
> > >
> >
> >
> >
> > --
> > E-Mail: garydgregory@gmail.com | ggregory@apache.org
> > Java Persistence with Hibernate, Second Edition
> > <http://www.manning.com/bauer3/>
> > JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> > Spring Batch in Action <http://www.manning.com/templier/>
> > Blog: http://garygregory.wordpress.com
> > Home: http://garygregory.com/
> > Tweet! http://twitter.com/GaryGregory
> >
>
>
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>



-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message