db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Collation implementation WAS Re: Should COLLATION attribute related code go in BasicDatabase?
Date Thu, 15 Mar 2007 23:42:58 GMT


Daniel John Debrunner wrote:
> Mike Matrigali wrote:
> 
>> Ok, so effectively language will store collation information on a per
>> column basis.  10.3 will interpret 0 representing USC_BASIC, and some
>> to be defined method will assign other values for other collations. 
>> Will need to make sure there aren't any jdbc calls that blindly return
>> scale currently for character types.
> 
> 
> I had to rush the last e-mail about scale since I had to pick my son up 
> from school, so sorry for that.
> 
> I'm not saying that DataTypeDescriptor.getScale() for a character column 
>  changes in any way, its api remains the same which would be to return 
> zero for any character column.
> 
> However for a character datatype we could use the space on-disk that 
> scale currently occupies to write collation information, since it's 
> always written as zero currently for characters. So the writeExternal() 
> would have something like (not actual methods)
> 
>    if (i_am_character_type)
>      out.writeInt(collation);
>    else
>      out.writeInt(scale);
> 
> 
> and the readExternal
> 
>    int v = in.readInt();
>    if (i_am_character_type)
>    {
>       collation = v;
>       scale = 0;
>    }
>    else
>    {
>       scale = v;
>    }
> 
> Hope that clears that up.
> Dan.
thanks, that is what I thought.  I didn't really think about how the 
metadata would be returned for scale - probably still worth making sure
we test the metadata scale call in a collated db.

I am just getting clear in my mind what we are doing with language 
metadata in the proposal.  Since we are writing per-column metadata for 
collation in language, it is harder for
me to argue against per column metadata in store.


physically I am not sure the best way to store it.

Are we sure the collation id can be represented as an INT?  I may have
missed it but do we expect a different number here for each different
language, or is there a single number that says sort based on language
and go look up language somewhere else?

options include:
1) most straight forward would be an array with an entry for each column 
  whether it is character or not. If we use compressedInteger format we 
can get away with only 1 byte per "null" entry.  Note on the way out it
is easy to tell if it is a character, but on the way back we only have
format id's.  I was hoping to have a single call to datafactory(format 
id, collate id) and get back the correct object.

Will it ever make sense to assocate a collation with something other
than a character type?

2) some sort of encoded sparse index with entries only for the character 
columns (anyone know if there is a java utility to do this)?  The 
downside is that this usually means even more data stored than option 1
in some cases.

3) some sort of format that on read would depend on first getting an
uncollated datatype of type format-id and then regetting it based on
some code.  So maybe some extra object creation and extra cpu overhead
to create the template in readExternal.
> 
> 
> 
> 


Mime
View raw message