db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta Satoor" <msat...@gmail.com>
Subject Re: Collation implementation WAS Re: Should COLLATION attribute related code go in BasicDatabase?
Date Thu, 15 Mar 2007 17:05:44 GMT
Dan, I understand your concern about changes required in many places in the
code to make sure that we get write instance of character datatype ie
SQLChar vs CollatorSQLChar.

But I don't understand your following comment
"My first thought is that this doesn't scale and doesn't seem like an OO
solution. To think ahead this means any addition collation style will
also add four new datatypes, which means there could easily be sixteen
or more datatypes to represent the characters. Each datatype will come
with some code cost, classes and/or methods per type."

For any additional collation styles, CollatorSQLChar will just need to be
instantiated with proper RuleBasedCollator object for that collation style.
We wouldn't need to create 4 new datatype for every new collation style
introduced in Derby. Can you elaborate more on what you mean by your comment
above.

Mamta


On 3/15/07, Daniel John Debrunner <djd@apache.org> wrote:
>
> Mamta Satoor wrote:
> > Ok, so I spent some time trying to move COLLATION attribute code from
> > DataDictionaryImpl.boot to DataValueFactoryImpl.boot. I thought I could
> > simply put following piece of code in DataValueFactoryImpl.boot method
> > and the Property.COLLATION will get saved in the properties
> > conglomerate.
>
> I think some of this goes back to the intended implementation.
>
> The intended implementation seems to be that there will be variants of
> the four character datatypes with locale based collation. This is four
> new (internal) datatypes in Derby that share most code with the existing
> CHAR, VARCHAR, LONG VARCHHAR and CLOB types.
>
> I'm not sure this is the correct approach.
>
> My first thought is that this doesn't scale and doesn't seem like an OO
> solution. To think ahead this means any addition collation style will
> also add four new datatypes, which means there could easily be sixteen
> or more datatypes to represent the characters. Each datatype will come
> with some code cost, classes and/or methods per type.
>
> My second concern is that many places get characters and the change must
> ensure they get the correct datatype, apart from potentially being a lot
> of work, the chance of missing some or picking the wrong character types
> seems high.
>
> What is really required is 'character type + collation'. I've been
> thinking that looking at the problem in this way may make it more
> manageable and easier to contain, with the main idea being only worry
> about collation type when actually performing a collation. So some
> initial ideas:
>
> - collation is a attribute of DataTypeDescriptor, not valid for non
> character types, 0 for UCS_BASIC, 1 for UNICODE etc.
>        int getCollationType();
>
> - A method on DataValueFactory, returns null if type is UCS_BASIC
>        RuleBasedCollator getCharacterCollator(int type)
>
> - A method on StringDataValue
>        StringDataValue getValue(RuleBasedCollator collator)
>
>        For SQLChar:
>             getValue(null) would return itself
>             getValue(non-null) would return a new CollateSQLChar() with
> the value of the SQLChar and the collator set.
>
>        For CollatorSQLChar
>            getValue(null) would return a new SQLChar() with the value
> of the CollateSQLChar
>            getValue(non-null) would return itself with the collator set
> correctly.
>
> - The collation type (the integer) is written into the meta-data for an
> index just as ascending/descending is today (including the btree control
> row, thus making the information available for recovery). Collation type
> applies to all character columns in the index.
>
> - At SQL collation time, the code generation sets up the various types
> correctly using the new methods.
>
> - At recovery time the btree uses the collation type and the data value
> factory to setup its template row array correctly. Something like
>      for each dvd in row array
>         if (dvd instanceof StringDataValue)
>              dvd = dvd.getValue(dvf.getCharacterCollator(type));
>
> - setting the collation property remains in the data dictionary
>
> - basic database sets the locale for the DataValueFactory after it boots
> it, using a new method on DVF
>         void setLocale(Locale locale);
>
> I think approaching the problem this way will lead to a cleaner solution
> in the long term and be somewhat easier to implement.
>
> Thanks,
> Dan.
>
>
>
>
>

Mime
View raw message