Return-Path: X-Original-To: apmail-commons-dev-archive@www.apache.org Delivered-To: apmail-commons-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ACB60113E5 for ; Thu, 10 Jul 2014 13:36:42 +0000 (UTC) Received: (qmail 82520 invoked by uid 500); 10 Jul 2014 13:36:42 -0000 Delivered-To: apmail-commons-dev-archive@commons.apache.org Received: (qmail 82399 invoked by uid 500); 10 Jul 2014 13:36:42 -0000 Mailing-List: contact dev-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Developers List" Delivered-To: mailing list dev@commons.apache.org Received: (qmail 82383 invoked by uid 99); 10 Jul 2014 13:36:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jul 2014 13:36:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [184.154.48.171] (HELO delivery.mailspamprotection.com) (184.154.48.171) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jul 2014 13:36:37 +0000 Received: from ns1.siteground172.com ([184.154.160.14] helo=serv01.siteground172.com) by se5.mailspamprotection.com with esmtps (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.82) (envelope-from ) id 1X5EW5-00037E-B5 for dev@commons.apache.org; Thu, 10 Jul 2014 08:36:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sandglass-software.com; s=dkim; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:References:Subject:To:MIME-Version:From:Date:Message-ID; bh=y30ZDVlHQAq+IzClrj8Wxuh2WjG7oPcmyM2EGWiegAE=; b=H9em0a/nh0nM3IslOCdzTY8kyLzntVj1oa6PwLdQ/uHmtxKru4SQhkYbA5nPpsqSJ3a2PvP6/JMn0ypP6SWxhA2YmcpAv2uSy1tugnyqiBXYouLuHbEfw8Qd3+xeiAp9ckWY/lEezLxzxokpgrvpz294l8H+hX9y+W4EpUfhU8Q=; Received: from [195.74.135.243] (port=51102 helo=[192.168.51.48]) by serv01.siteground172.com with esmtpsa (TLSv1:RC4-SHA:128) (Exim 4.80.1) (envelope-from ) id 1X5EVw-0003Gv-RE for dev@commons.apache.org; Thu, 10 Jul 2014 08:35:56 -0500 Message-ID: <53BE96C4.7000704@sandglass-software.com> Date: Thu, 10 Jul 2014 14:36:04 +0100 From: Adrian Crum Organization: Sandglass Software User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Commons Developers List Subject: Re: [CSV] Proposed fix for CSV-35 (Was: Fwd: [jira] [Comment Edited] (CSV-35) Escaped line separators are not supported) References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - serv01.siteground172.com X-AntiAbuse: Original Domain - commons.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - sandglass-software.com X-Get-Message-Sender-Via: serv01.siteground172.com: none X-Filter-ID: XtLePq6GTMn8G68F0EmQvYAgAWPFUAzp+Jo6fAgZIhdAY1BaLfwb0I4SZzVEHfhbITx1A9a4ShBP XO22k6PcG+P0qq0TQmXtujrP3JhXfFBQklIO0zDZIPkc/FQViMV6PmPkSGfYD/FnyXwEc05LTfbj aPZXen3YvWHScRpGw7As8Ty+vIv2hhe3LQoUC67lm7vH9ivOiwbqgrYa3bEHrRxwEiTVJqDh0qKo KsXx5ln/9vQ5EPFfTsW9KxP0Dz/YI9fH8lha7ls3IOXSsCRtBmYFnDI4COlbPvJXcxILzP6ohSRe n0swyEiTd4gL8GGVeA6scZZwpzkCo4WoBCri0qGN9+hSBizQcSqf0GO8rFjFSxrMCTQ8mtBCYQBd nlfM0GxqpKMoB1z1AkYSwWRAY/S9rSA/+PSXZ4JHMPIIwwiM3Yvwr+B6fmfKJdeJon34eeS8OoR+ vmcoqXdEkwnlmN40eTXlWiUAYdLmsJdAoPIYG8IVyQY3SyMBX1+3ivMsUSWUUXymGf4CWJqB6LhD jGPJyuOAC0rlBKUNbJ681v/QbGqkoygHXPUCRhLBZEBjGvOS7UcqEyRDcZgWqJK1GQ/3aWd5YI6c rHV+exxk80k8F/IJRA8ZF8C5AnJBxSFDUn4Uq2qRxJle7OY5I+7311Bd69CHVnvQFZSQNrBKQ/NO 2p1CrtwY3FuqHVSnpYU1jZdriJ9R7F4GX3hrH2bFa44+kRMK+H4ErZqLQn+/BnkxL7hrJSk60SF3 F6RYOYr2 X-Report-Abuse-To: spam@quarantine.mailspamprotection.com X-Filter-Fingerprint: cPaH8lomer6UwsJ3BnJDyts/W+OagfBsRYjpo1vNhue0VFDyP20las9Mq1v6nXmfrqKtWpHLpkE8 c09GKJn2two2qD4umBQ1qZPzBMYEp7vd1PHxHTVUQ6kKwyUfwBkX0GdeLJT8j644PGaY+JTCL0ej vfBM5HDil32B6EA6ECAwtB2mFrRhH9FyTxY8sWyB3RITPbnoYOdXO6LTV7rxg5jp2wMQvksbXuh+ Zur/YDgNaJU0bkU579fapNXDRWH7lv8Ffx5tjiiMQ9xvR0covRSi7z2FLCcCZEFlyrOhEq7nmJ+/ YSDqmOKKi3AGWIA1zwtuLdo1DZPSP6F+699kztjb9yzjs43AYZn/UUAe7JeQCLEKY2POeuK6cAfb EkW91A774He9uoSIDCzh7nA6BCZ04CTT198Wlq0AEQPPDM1DviCPWnfXDOd6DFWf2MmiQaSCU4c5 rHUVUq4aUIqr5ioQwOUunYbdx3QEia/CTTU= X-Originating-IP: 184.154.160.14 X-SpamExperts-Domain: siteground172.com X-SpamExperts-Username: 184.154.160.14 Authentication-Results: mailspamprotection.com; auth=pass smtp.auth=184.154.160.14 X-SpamExperts-Outgoing-Class: ham X-SpamExperts-Outgoing-Evidence: Combined (0.02) X-Recommended-Action: accept X-Virus-Checked: Checked by ClamAV on apache.org I agree that we should stop worrying about edge cases and release a version that covers the majority of needs. Adrian Crum Sandglass Software www.sandglass-software.com On 7/10/2014 9:12 AM, Benedikt Ritter wrote: > 2014-07-09 4:15 GMT+02:00 Gary Gregory : > >> We do have a discrepancy between our format class and lexer (which is >> hardwired with CR & LF). >> >> Ideally, it seems the lexer should pickup it's set of EOL Strings from the >> format. >> >> I recall reading worries of performance issues changing this but either we >> support all of the EOL strings including some of the odd ball ones like >> Unicode, or we do not. Perhaps we can have an alternate Lexer that takes a >> set of EOL strings if performance is really that much worse. >> > > Sounds reasonable, but seems to be a lot of work. Maybe we can just > document that 1.0 can only handle CR & LF and add the ability for more > exotic record separators in 1.1. I'm hoping for higher adoption and more > patches once we have a release on maven central. > > Benedikt > > >> >> Gary >> >> >> On Mon, Jul 7, 2014 at 1:47 PM, Benedikt Ritter >> wrote: >> >>> Any thoughts about this fix? Could be a solution to push out 1.0. If we >>> come up with a more generic solution afterwards, we can still deprecate >>> escapeCRLFOnce. >>> >>> Benedikt >>> >>> ---------- Forwarded message ---------- >>> From: Tillmann Gaida (JIRA) >>> Date: 2014-06-30 10:36 GMT+02:00 >>> Subject: [jira] [Comment Edited] (CSV-35) Escaped line separators are not >>> supported >>> To: britter@apache.org >>> >>> >>> >>> [ >>> >>> >> https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047460#comment-14047460 >>> ] >>> >>> Tillmann Gaida edited comment on CSV-35 at 6/30/14 8:34 AM: >>> ------------------------------------------------------------ >>> >>> I added a patch "commons-csv CSV-35 escapeCRLFOnce[ test].patch", which >>> introduces a CSVFormat setting "escapeCRLFOnce", which enables the >> desired >>> behaviour in Lexer. It is false by default and I did not change >>> CSVFormat.MYSQL, which might be approprate. I am not exactly happy with >> the >>> naming of the setting. Consider renaming it if you happen to build upon >> the >>> patch. >>> >>> EDIT: clarity >>> >>> EDIT: This is a very specific setting. A cleaner solution would probably >> be >>> to allow escaping of record separators by a single escape char. However >> it >>> appears that the MYSQL format uses LF as a record separator, so we would >>> need to have multiple record separators, which in this case would not be >>> actual record separators. >>> >>> I'd argue that CRLF is special enough to have an individual setting, but >> I >>> would also agree with having a cleaner CSVFormat. The only real >> alternative >>> would be having a way to individually specify character sequences and a >>> replacement if they are preceded by the escape char. >>> >>> >>> was (Author: tillmann gaida): >>> I added a patch "commons-csv CSV-35 escapeCRLFOnce[ test].patch", which >>> introduces a CSVFormat setting "escapeCRLFOnce", which enables the >> desired >>> behaviour in Lexer. It is false by default and I did not change >>> CSVFormat.MYSQL, which might be approprate. I am not exactly happy with >> the >>> naming of the setting. Consider renaming it if you happen to build upon >> the >>> patch. >>> >>> EDIT: clarity >>> >>>> Escaped line separators are not supported >>>> ----------------------------------------- >>>> >>>> Key: CSV-35 >>>> URL: https://issues.apache.org/jira/browse/CSV-35 >>>> Project: Commons CSV >>>> Issue Type: Bug >>>> Reporter: Emmanuel Bourg >>>> Fix For: 1.0 >>>> >>>> Attachments: CSV-35.patch, commons-csv CSV-35 escapeCRLFOnce >>> test.patch, commons-csv CSV-35 escapeCRLFOnce.patch, >>> mysql-export-line-terminated-by-crlf.csv, >>> mysql-export-line-terminated-by-lf.csv >>>> >>>> >>>> Commons CSV doesn't handle escaped line separators, for example: >>>> {code} >>>> value1;value2;value3a\ >>>> value3b >>>> {code} >>>> In this case the expected result is: >>>> {code}["value1", "value2", "value3a\nvalue3b"]{code} >>>> This kind of escaping is produced by MySQL, whether the field enclosing >>> is enabled or not. It's possible to see enclosing quotes and escaped line >>> separators like this: >>>> {code} >>>> "value1";"value2";"value3a\ >>>> value3b" >>>> {code} >>> >>> >>> >>> -- >>> This message was sent by Atlassian JIRA >>> (v6.2#6252) >>> >>> >>> >>> -- >>> http://people.apache.org/~britter/ >>> http://www.systemoutprintln.de/ >>> http://twitter.com/BenediktRitter >>> http://github.com/britter >>> >> >> >> >> -- >> E-Mail: garydgregory@gmail.com | ggregory@apache.org >> Java Persistence with Hibernate, Second Edition >> >> JUnit in Action, Second Edition >> Spring Batch in Action >> Blog: http://garygregory.wordpress.com >> Home: http://garygregory.com/ >> Tweet! http://twitter.com/GaryGregory >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org For additional commands, e-mail: dev-help@commons.apache.org