Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: <derby-dev@db.apache.org>
Received-SPF: neutral (herse.apache.org: local policy)
Message-ID: <454A7F1B.5050202@sbcglobal.net>
Date: Thu, 02 Nov 2006 15:28:27 -0800
From: Mike Matrigali <mikem_app@sbcglobal.net>
Reply-To: mikem_app@sbcglobal.net
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
MIME-Version: 1.0
To: derby-dev@db.apache.org
Subject: Re: (DERBY-378) implementing  import/export of large objects...
References: <45410499.808@gmail.com> <45478117.9010101@apache.org>
 <45491975.9090908@gmail.com> <454A4D7C.2010805@sbcglobal.net>
 <454A7105.60300@gmail.com>
In-Reply-To: <454A7105.60300@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit


Suresh Thalamati wrote:
> Mike Matrigali wrote:
> 
>>
>>
>> Suresh Thalamati wrote:
>>
>>> Daniel John Debrunner wrote:
>>>
>>>> Suresh Thalamati wrote:
>>>>
>>>>> BLOB:
>>>
>>>
>>>
>>> ....
>>>
>>>>>   1) Allow import/export of blob data only when they it is  stored 
>>>>> in an external file.
>>>>>
>>>>>   2) Write the blob data to the export file along with other data 
>>>>> types during export, assuming the blob data does not contain any 
>>>>> delimiters and throw an error if on import if it finds delimiters 
>>>>> inside the data, by interpreting it using the same code-set as the 
>>>>> other character data.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> I say option 1) and I assume it's for all binary data, not just 
>>>> BLOB, e.g. VARCHAR FOR BIT DATA etc. Seems like with binary data the 
>>>> chance of having a problematic character in the text file is close 
>>>> to 100%.
>>>>
>>>>
>>>> Dan.
>>>>
>>>>
>>>
>>> Thanks for reading the proposal, Dan.  I agree with you , chance of 
>>> finding  delimiter character inside a binary data is very high. I 
>>> will go with the option 1. Your assumption is correct , it applies to 
>>> all the binary data.
>>
>>
>>
>> I also agree, it seems like a reasonable first step to default binary 
>> export/import to external files.
>>
>> I am probably missing something here though.  I thought for char data 
>> there is an algorithm that handles delimiter data within it.  Why does
>> that not work for binary data also?  Is it a codeset issue?
>>
> 
> Yes. For character data, double delimiters are used if there are 
> delimiter characters inside the data. i.e  say if  a column contains
> 'he said "derby is solid database" ' , then it is written to the export 
> file as "he said ""derby is a solid database "" " . So on export of the 
> data, data is modified before writing to the file.
> 
> It may be possible to do the same thing by interpreting the binary data 
> in specified code-set and  add an extra delimiter for every delimiter 
> found on export and do the reverse on import.  But unlike the character 
> data , if the binary data is changed and if user import it to some other 
> application, the data may mean/look completely different if the added 
> extra delimiter characters are not removed.
Again I think the separate file/no delimiter solution is a good first
approach, I just wanted to understand the issue.  As you point out there
are multiple usage scenario's here:
1) someone has a derby db and wants to export for use into another derby db.
2) someone has a derby db and wants to export for use in another 
application.
3) someone has some data from another app and wants to import into derby.

I think the separate file solution works for #1.  I don't know how well
it works for option #2 and #3.  But at least for #2 it results in the
raw data without need to process it.

> 
> Another thing to note here is in a character data user knows there might 
> be delimiter characters inside and specify a delimiter character that is 
> not in the data, if he/she does not want data to modified on export. 
> Where as with binary data that is not possible at all.
> 
> If you think modifying a binary data on export is ok,  then it might be 
> possible to use same concept as character data.  My only concern here is 
> if we export image of a "cat"  and add delimiters to it by interpreting 
> it as character data, it may look  like a "tiger" if extra characters 
> added is not removed :-)
> 
> Thanks
> -suresh
> 
>