Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 92774 invoked from network); 25 Jun 2008 15:45:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 25 Jun 2008 15:45:06 -0000 Received: (qmail 57534 invoked by uid 500); 25 Jun 2008 15:45:06 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 57333 invoked by uid 500); 25 Jun 2008 15:45:05 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 57322 invoked by uid 99); 25 Jun 2008 15:45:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Jun 2008 08:45:05 -0700 X-ASF-Spam-Status: No, hits=-1.0 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [192.18.43.132] (HELO sca-es-mail-1.sun.com) (192.18.43.132) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Jun 2008 15:44:12 +0000 Received: from fe-sfbay-10.sun.com ([192.18.43.129]) by sca-es-mail-1.sun.com (8.13.7+Sun/8.12.9) with ESMTP id m5PFiL2n024717 for ; Wed, 25 Jun 2008 08:44:22 -0700 (PDT) Received: from conversion-daemon.fe-sfbay-10.sun.com by fe-sfbay-10.sun.com (Sun Java System Messaging Server 6.2-8.04 (built Feb 28 2007)) id <0K3000501ZCFBI00@fe-sfbay-10.sun.com> (original mail from Richard.Hillegas@Sun.COM) for derby-user@db.apache.org; Wed, 25 Jun 2008 08:44:21 -0700 (PDT) Received: from richard-hillegas-computer.local ([129.150.16.243]) by fe-sfbay-10.sun.com (Sun Java System Messaging Server 6.2-8.04 (built Feb 28 2007)) with ESMTPSA id <0K30008MNZPQ64A0@fe-sfbay-10.sun.com> for derby-user@db.apache.org; Wed, 25 Jun 2008 08:44:14 -0700 (PDT) Date: Wed, 25 Jun 2008 08:44:13 -0700 From: Rick Hillegas Subject: Re: CSV with fails with doublequote inside the value In-reply-to: <7b813ada0806250359r75c94ccfob072d138b2ce011c@mail.gmail.com> Sender: Richard.Hillegas@Sun.COM To: Derby Discussion Message-id: <486267CD.4080809@sun.com> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=ISO-8859-1 Content-transfer-encoding: 7BIT References: <7b813ada0806250359r75c94ccfob072d138b2ce011c@mail.gmail.com> User-Agent: Thunderbird 2.0.0.14 (Macintosh/20080421) X-Virus-Checked: Checked by ClamAV on apache.org Hi Albert, Derby's understanding of CSV predates RFC 4180. For instance, Derby trims leading and trailing spaces from imported columns although this behavior is forbidden by RFC 4180. Derby's CSV dialect is described in the Tools Guide in a section titled "File format for input and output". This description is impressionistic rather than rigorous. I don't think that the grammar of Derby's CSV dialect is published anywhere. Other than that sketch in the Tools Guide, I think that the only contract is that Derby promises to be able to re-import anything which Derby itself exported. In practice, I think that this means that plenty of edge cases plague Derby when it imports files exported by other data sources. A lot of these edge cases can be handled by using the middle arguments of SYSCS_IMPORT_TABLE to fine-tune delimiters and code sets. If you can't fine-tune SYSCS_IMPORT_TABLE well enough, you can always write your own table function to wrap the foreign data file and then import the file like this insert into MyTable select * from table( MyTableFunction( '/path/to/the/data/file' ) ) s Given the guidelines sketched in the Tools Guide, I would expect Derby to object to the file you are trying to import. If you want Derby to include the extra double-quote in the imported text, then I would escape it with another double-quote. For instance, the following file: "foo", 1, "bar" "wibble""", 2, "wombat" imports successfully for me: ij> call syscs_util.syscs_import_table ( 'APP', 'T', 'doubleQuote.csv', null, null, null, 0 ); 0 rows inserted/updated/deleted ij> select * from t; A |B |C ----------------------------------------------------------------------------------------------------------------- foo |1 |bar wibble" |2 |wombat 2 rows selected Hope this helps, -Rick Albert Kam wrote: > Hello Apache Derby, > > I'm using db-derby-10.4.1.3-bin, and when i'm importing a csv file > like this : > > *ij> CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE > (null,'SMS_OUT','/home/albert/mnt/linuxdb/rnd/sofco_sms/CSVs/sms_out.csv',null,null,null,0); > * > an error pops out like this : > *ERROR XIE0R: Import error on line 11 of file > /home/albert/mnt/linuxdb/rnd/sofco_sms/CSVs/sms_out.csv: Data found on > line 11 for column 3 after the stop delimiter. > ERROR XIE03: Data found on line 11 for column 3 after the stop delimiter. > * > I check the file, and in the line 11, it goes like this : > "92","+628xxxxxxx","*Testinggg -__-"*","2006-08-03 > 11:43:42","0","0","-1","-1","-1" > (Notice the -__-"") the content should be *Testinggg -__-"* > > So, i tried changing the *Testinggg -__-"* into *Testinggg -__-\"*, > but still errornous .. > > Later after googling for a while, i found out about this : > http://www.perlmonks.org/?node_id=678257 > > Here's one interesting quote : > / > > CSV as defined by RFC 4180 does > not "escape" double quotes with a backslash, but rather by an > additional set of double quotes. Your parser fails to handle this > format properly. > > CSV is hard. > > / > Is this a bug or is it something that i did ? For now, i removed the > doublequotes and it worked like charm. > > Regards, > Albert Kam > > -- > Do not pursue the past. Do not lose yourself in the future. > The past no longer is. The future has not yet come. > Looking deeply at life as it is in the very here and now, > the practitioner dwells in stability and freedom. > (Thich Nhat Hanh)