db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Collation implementation WAS Re: Should COLLATION attribute related code go in BasicDatabase?
Date Thu, 15 Mar 2007 22:45:06 GMT


Daniel John Debrunner wrote:
> Mike Matrigali wrote:
> 
>>
>>
>> Rick Hillegas wrote:
>>
>>>>
>>>> Thanks, Mike. This overhead seems pretty small to me. It's hard for 
>>>> me to predict whether this is useful generality or over-design.
>>>>
>>>> In the SQL standard, collations can be declared per column. That 
>>>> affects index descriptors. In addition, via CASTs, collations can be 
>>>> declared per sortable expression in an ORDER BY clause. That affects 
>>>> the sorter. I'm not the person scratching this initial itch. I just 
>>>> want to register my instinct to design-in the generality up front. I 
>>>> think this has two advantages:
>>>>
>>>> 1) It will remove an upgrade issue later on when someone wants to 
>>>> implement more of the SQL collation support.
>>>>
>>>> 2) It generally lowers the barrier to implementing more of the 
>>>> standard.
>>>>
>>>> Regards,
>>>> -Rick
>>>
>>>
>> I am just not sure how comfortable I feel forcing an upgrade issue on a
>> developer for a particular feature that is not their itch.   Mamta is 
>> trying to solve single collation database problem, not full SQL 
>> collation support.
> 
> 
> There's a number of factors that come in, one is the long term 
> maintainability of the code. I think that trumps any single developer's 
> itch. The developer can work with the community in coming up with a 
> solution that keeps a good balance between what the community see as 
> maintainability and scratching their itch.
> 
> I'm actually trying to save the contributor (Mamta) work here, I think 
> changing all the locations that generate characters to have the correct 
> "new-character-type" is a huge amount of work and subject to errors 
> (just from the amount of changes and interesting situations). E.g. in 
> some situations a literal will be a CHAR (sorting by ucs_basic) and 
> others a CHAR (sorting by locale). That decision may not be able to be 
> made until very late in the bind time, and may not possibly even matter 
> even thought code would have to pick one. Only caring about this when 
> collation is involved may make it easier.

I obviously don't know "all the places", so it is not clear to me why 
some of the places don't have to change.  It is not clear to me why one
does not in the new proposal have to change all the locations that 
generate characters to have the correct "new-collation-type".  I think
this is because I dont understand the runtime usages.  Am I at least
right about the following locations where we persist the columns.  If
we get the right info into them when we persist them, then we can get
the right info into them when we read them back.

The main places I think about are the persistent ones:
1) system catalog creation code.
    o I assume you still have to change this code so that when the 
character columns are created, they get the proper info stored in the 
metadata.

language metadata:
     old proposal an old typeid, 0 in scale.
     new proposal an old typeid, 0 in scale.

store metadata:
     old proposal an old typeid, no new metadata.
     new proposal an old typeid, some new metadata maybe per-column, 
maybe per conglomerate

2) user table creation code.
    o I assume you still have to change this code so that when the 
character columns are created, they get the proper info stored in the 
metadata.

language metadata:
     old proposal an new typeid if collated db, 0 in scale.
     new proposal an old typeid, new collate id if collated db in scale.

store metadata:
     old proposal an new typeid if collated db, no new metadata.
     new proposal an old typeid, some new metadata maybe per-column, 
maybe per conglomerate.

In the new proposal it looks to me like new collation metadata is stored 
on a per column basis in language.

> 
>> Your suggestion may get us more there, not arguing that.  But a solution
>> shorter along an agreed upon direction seems fine to me, and I would not
>> hold up a developer contribution that did that.  If the community 
>> feels that
>> 4 new classes is ok, but 4 new types is not the right direction then
>> it is reasonable to work with the community to get the direction right.
>>
>> I am waiting on Dan's reply as I think there are SYSTABLES and/or 
>> SYSCOLUMNS metadata changes necessary that haven't been discussed.  
> 
> 
> No changes to SYSTABLES or SYSCOLUMNS are needed for what Mamta is 
> proposing. Support per-schema collation would probably need some change, 
> though strangely enough per-column would probably not.
> 
> Possibly changing the way TypeDescriptorImpl writes itself out to disk 
> would be needed, but there's enough room in the current format to store 
> the collation information by overloading the on-disk space occupied by 
> scale, since scale is always zero for a character type.
> 
Ok, so effectively language will store collation information on a per
column basis.  10.3 will interpret 0 representing USC_BASIC, and some
to be defined method will assign other values for other collations. 
Will need to make sure there aren't any jdbc calls that blindly return
scale currently for character types.

> Dan.
> 
> 
> 


Mime
View raw message