db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel John Debrunner <...@apache.org>
Subject Re: Collation implementation WAS Re: Should COLLATION attribute related code go in BasicDatabase?
Date Fri, 16 Mar 2007 00:46:27 GMT
Mike Matrigali wrote:

> physically I am not sure the best way to store it.
> Are we sure the collation id can be represented as an INT?  I may have
> missed it but do we expect a different number here for each different
> language, or is there a single number that says sort based on language
> and go look up language somewhere else?

Single values, the locale is fixed by the database:

0 - collation using code point order UCS_BASIC
1 - collation using the locale of the current database ("unicode")
2 - collation using LOWER() with the locale of the current database 
3 - collation using UPPER() with the locale of the current database 

These map to the way SQL does it, which is fixed names for collations.

Now I guess in theory there could be additional futures of:
   collate according a specific locale (e.g. french) in a database of a 
different locale.
   collate according to a user defined class

My guess these could be handled with an integer and indirection. The 
DataValueFactory would assign values dynamically within a database, so 
it would use 100 for locale french and also store in service.properties 
the mapping between collation 100 and locale french. And of course in a 
different database 100 might mean collate using com.acme.myapp.MyCollator.

So I think single values will suffice.

> options include:
> 1) most straight forward would be an array with an entry for each column 
>  whether it is character or not. If we use compressedInteger format we 
> can get away with only 1 byte per "null" entry.  Note on the way out it
> is easy to tell if it is a character, but on the way back we only have
> format id's.  I was hoping to have a single call to datafactory(format 
> id, collate id) and get back the correct object.
> Will it ever make sense to assocate a collation with something other
> than a character type?

Not that I can think of, and I think an int range provides for lots of 

> 2) some sort of encoded sparse index with entries only for the character 
> columns (anyone know if there is a java utility to do this)?  The 
> downside is that this usually means even more data stored than option 1
> in some cases.

One option is if there are no character columns don't have the array.

> 3) some sort of format that on read would depend on first getting an
> uncollated datatype of type format-id and then regetting it based on
> some code.  So maybe some extra object creation and extra cpu overhead
> to create the template in readExternal.

Not sure how this would work.


View raw message