db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel John Debrunner <...@apache.org>
Subject Re: Collation implementation WAS Re: Should COLLATION attribute related code go in BasicDatabase?
Date Thu, 15 Mar 2007 21:39:42 GMT
Mike Matrigali wrote:
> 
> 
> Rick Hillegas wrote:
> 
>>>
>>> Thanks, Mike. This overhead seems pretty small to me. It's hard for 
>>> me to predict whether this is useful generality or over-design.
>>>
>>> In the SQL standard, collations can be declared per column. That 
>>> affects index descriptors. In addition, via CASTs, collations can be 
>>> declared per sortable expression in an ORDER BY clause. That affects 
>>> the sorter. I'm not the person scratching this initial itch. I just 
>>> want to register my instinct to design-in the generality up front. I 
>>> think this has two advantages:
>>>
>>> 1) It will remove an upgrade issue later on when someone wants to 
>>> implement more of the SQL collation support.
>>>
>>> 2) It generally lowers the barrier to implementing more of the standard.
>>>
>>> Regards,
>>> -Rick
>>
> I am just not sure how comfortable I feel forcing an upgrade issue on a
> developer for a particular feature that is not their itch.   Mamta is 
> trying to solve single collation database problem, not full SQL 
> collation support.

There's a number of factors that come in, one is the long term 
maintainability of the code. I think that trumps any single developer's 
itch. The developer can work with the community in coming up with a 
solution that keeps a good balance between what the community see as 
maintainability and scratching their itch.

I'm actually trying to save the contributor (Mamta) work here, I think 
changing all the locations that generate characters to have the correct 
"new-character-type" is a huge amount of work and subject to errors 
(just from the amount of changes and interesting situations). E.g. in 
some situations a literal will be a CHAR (sorting by ucs_basic) and 
others a CHAR (sorting by locale). That decision may not be able to be 
made until very late in the bind time, and may not possibly even matter 
even thought code would have to pick one. Only caring about this when 
collation is involved may make it easier.

> Your suggestion may get us more there, not arguing that.  But a solution
> shorter along an agreed upon direction seems fine to me, and I would not
> hold up a developer contribution that did that.  If the community feels 
> that
> 4 new classes is ok, but 4 new types is not the right direction then
> it is reasonable to work with the community to get the direction right.
> 
> I am waiting on Dan's reply as I think there are SYSTABLES and/or 
> SYSCOLUMNS metadata changes necessary that haven't been discussed.  

No changes to SYSTABLES or SYSCOLUMNS are needed for what Mamta is 
proposing. Support per-schema collation would probably need some change, 
though strangely enough per-column would probably not.

Possibly changing the way TypeDescriptorImpl writes itself out to disk 
would be needed, but there's enough room in the current format to store 
the collation information by overloading the on-disk space occupied by 
scale, since scale is always zero for a character type.

Dan.


Mime
View raw message