db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel John Debrunner <...@apache.org>
Subject Re: Collation implementation WAS Re: Should COLLATION attribute related code go in BasicDatabase?
Date Thu, 15 Mar 2007 19:28:29 GMT
Daniel John Debrunner wrote:
> Mike Matrigali wrote:
>> Daniel John Debrunner wrote:
>>> Rick Hillegas wrote:
>>>> Daniel John Debrunner wrote:
>>>>> Rick Hillegas wrote:
>>>>>> Daniel John Debrunner wrote:
>>>>>>> ...
>>>>>>> - The collation type (the integer) is written into the meta-data

>>>>>>> for an index just as ascending/descending is today (including
>>>>>>> btree control row, thus making the information available for

>>>>>>> recovery). Collation type applies to all character columns in
>>>>>>> index.
>>>>>> This suggests that all of the columns in the index must have the

>>>>>> same collation? I don't think that is powerful enough to support

>>>>>> the full-blown SQL collation language, which allows you to mix 
>>>>>> differently collated columns in an ORDER BY clause. Why can't the

>>>>>> collation type be an array of ints just as the sort direction is

>>>>>> an array of booleans in IndexDescriptor?
>>>>> That would be more flexible, but is it required?
>>>> I believe so. I don't see any rule which requires one collation for 
>>>> all of the character expressions in an ORDER BY clause. There does 
>>>> seem to be a rule requiring one collation for both sides of a 
>>>> comparison, e.g., for both sides of a < operator.
>>> I understand ORDER BY with different collations is required, but I 
>>> meant  are multiple collations required in a single BTREE index, 
>>> which is where this meta data would be stored. With the plans for 
>>> DERBY-1478 it isn't, with new collations it isn't, with collation 
>>> per-schema it isn't, so I was wondering what would trigger it? If 
>>> it's not in the foreseeable future or an option through SQL then I 
>>> would say a simple single collation will work. Future expansion could 
>>> change it to be per-column when required.
>> This is where I get confused.  Are multiple collations required in a 
>> single database?  With plans for DERBY-1478 it isn't.  With new 
>> collations it isn't.
>> With collation per-schema it is, but should we pay overhead now for a
>> possible future, as long as we have a design that supports an upgrade
>> path to it?
>> I am not seeing the value in the argument for storing it once per
>> table vs. once per database today.
> That's true. If the store used this per-database value when setting up 
> the row arrays for an index then the path forward to per-schema 
> collations would be clear. In that case in the future the store could 
> override the per-database value with the value from the index meta data.
> This would then mean that the default collation type must be stored in 
> service.properties and thus be available to the store when it boots. The 
> reading of the attribute logic could continue to be in DataValueFactory 
> and should work similar to the code Mamta posted earlier.

Actually I think the above won't work, since 10.3 will have *two* 
collations per database. UCS_BASIC for system tables and locale based 
for user tables. Thus I think the information is needed at the index 
level unless store has a mechanism to figure out which are system indexes.

The different format identifier for the same type is what worries me, it 
seems there is ample opportunity to end up with the wrong CHAR data type 
and thus lead to bugs. Today's code handles this ok and it more obvious 
since there is a 1-1 mapping between:
        SQL type name (e.g. INTEGER) and derby internal type
        JDBC type name (e.g. Types.CHAR) and derby internal type

breaking that to be a 1-N mapping seems to be a huge risk. There is a 
lot of places in the code where a character datatype is created, can we 
guarantee to catch all of them? Limiting the exposure to collation 
related activity seems safer to me.


View raw message