Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 113 invoked from network); 2 Nov 2006 23:28:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Nov 2006 23:28:57 -0000 Received: (qmail 93833 invoked by uid 500); 2 Nov 2006 23:29:08 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 93631 invoked by uid 500); 2 Nov 2006 23:29:07 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 93622 invoked by uid 99); 2 Nov 2006 23:29:07 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Nov 2006 15:29:07 -0800 X-ASF-Spam-Status: No, hits=1.9 required=10.0 tests=DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [32.97.182.145] (HELO e5.ny.us.ibm.com) (32.97.182.145) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Nov 2006 15:28:53 -0800 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e5.ny.us.ibm.com (8.13.8/8.12.11) with ESMTP id kA2NSVgd026662 for ; Thu, 2 Nov 2006 18:28:31 -0500 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay04.pok.ibm.com (8.13.6/8.13.6/NCO v8.1.1) with ESMTP id kA2NSTrN132678 for ; Thu, 2 Nov 2006 18:28:29 -0500 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id kA2NSTB7016908 for ; Thu, 2 Nov 2006 18:28:29 -0500 Received: from [127.0.0.1] (dyn9-72-133-101.usca.ibm.com [9.72.133.101]) by d01av03.pok.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id kA2NSRFG016834 for ; Thu, 2 Nov 2006 18:28:29 -0500 Message-ID: <454A7F1B.5050202@sbcglobal.net> Date: Thu, 02 Nov 2006 15:28:27 -0800 From: Mike Matrigali Reply-To: mikem_app@sbcglobal.net User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: derby-dev@db.apache.org Subject: Re: (DERBY-378) implementing import/export of large objects... References: <45410499.808@gmail.com> <45478117.9010101@apache.org> <45491975.9090908@gmail.com> <454A4D7C.2010805@sbcglobal.net> <454A7105.60300@gmail.com> In-Reply-To: <454A7105.60300@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Suresh Thalamati wrote: > Mike Matrigali wrote: > >> >> >> Suresh Thalamati wrote: >> >>> Daniel John Debrunner wrote: >>> >>>> Suresh Thalamati wrote: >>>> >>>>> BLOB: >>> >>> >>> >>> .... >>> >>>>> 1) Allow import/export of blob data only when they it is stored >>>>> in an external file. >>>>> >>>>> 2) Write the blob data to the export file along with other data >>>>> types during export, assuming the blob data does not contain any >>>>> delimiters and throw an error if on import if it finds delimiters >>>>> inside the data, by interpreting it using the same code-set as the >>>>> other character data. >>>> >>>> >>>> >>>> >>>> >>>> I say option 1) and I assume it's for all binary data, not just >>>> BLOB, e.g. VARCHAR FOR BIT DATA etc. Seems like with binary data the >>>> chance of having a problematic character in the text file is close >>>> to 100%. >>>> >>>> >>>> Dan. >>>> >>>> >>> >>> Thanks for reading the proposal, Dan. I agree with you , chance of >>> finding delimiter character inside a binary data is very high. I >>> will go with the option 1. Your assumption is correct , it applies to >>> all the binary data. >> >> >> >> I also agree, it seems like a reasonable first step to default binary >> export/import to external files. >> >> I am probably missing something here though. I thought for char data >> there is an algorithm that handles delimiter data within it. Why does >> that not work for binary data also? Is it a codeset issue? >> > > Yes. For character data, double delimiters are used if there are > delimiter characters inside the data. i.e say if a column contains > 'he said "derby is solid database" ' , then it is written to the export > file as "he said ""derby is a solid database "" " . So on export of the > data, data is modified before writing to the file. > > It may be possible to do the same thing by interpreting the binary data > in specified code-set and add an extra delimiter for every delimiter > found on export and do the reverse on import. But unlike the character > data , if the binary data is changed and if user import it to some other > application, the data may mean/look completely different if the added > extra delimiter characters are not removed. Again I think the separate file/no delimiter solution is a good first approach, I just wanted to understand the issue. As you point out there are multiple usage scenario's here: 1) someone has a derby db and wants to export for use into another derby db. 2) someone has a derby db and wants to export for use in another application. 3) someone has some data from another app and wants to import into derby. I think the separate file solution works for #1. I don't know how well it works for option #2 and #3. But at least for #2 it results in the raw data without need to process it. > > Another thing to note here is in a character data user knows there might > be delimiter characters inside and specify a delimiter character that is > not in the data, if he/she does not want data to modified on export. > Where as with binary data that is not possible at all. > > If you think modifying a binary data on export is ok, then it might be > possible to use same concept as character data. My only concern here is > if we export image of a "cat" and add delimiters to it by interpreting > it as character data, it may look like a "tiger" if extra characters > added is not removed :-) > > Thanks > -suresh > >