db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Update of "BuiltInLanguageBasedOrderingDERBY-1478" by MamtaSatoor
Date Mon, 02 Apr 2007 20:00:19 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by MamtaSatoor:
http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478

------------------------------------------------------------------------------
   
  In the original proposal, the intention was to introduce new internal CHAR datatype which
extended current CHAR datatype in Derby. This would have been implemented by having a new
format id associated with the new internal CHAR datatypes. But with that proposal, there was
overhead associated with implementing new getter methods in DataValueFactory for this new
internal datatype and the type compiler associated with the new internal datatype etc. The
other issue with the proposal was that there are many places in the code today where we get
character datatypes and all of those cases will have to be inidividually investigated to see
which CHAR datatype implementation they should use. So, if the character datatype is getting
instantiated for CHAR columns in system tables, then we should use existing CHAR datatype
implementation. But, if they were getting instantiated for user table, then the new internal
CHAR datatype should be instantiated. AND there will be places where we c
 an't determine which one of the two CHAR implementations should we use, for eg a string value
in a query 'abc'. 
   
- The second proposal(current) was based on the idea that CHAR with territory based collation
differs from the CHAR with default collation in only one aspect and ie how they are collated.
Rest everything is same. So, as long as we know at the collation time, which kind of collation
we are dealing with, we should be fine and hence there is no need to generate new internal
CHAR datatypes. With that proposal, at compile time, when we associate a DataTypeDescriptor
(DTD) with a char column, we tell what kind of collation should be associated with that DTD
and how was that collation derived "collation derivation". If the collation derivation is
"none", then the collation type should be ignored. Otherwise, the collation type associated
with DTD can be UCS_BASIC/territory base. Char columns associated with SYS schemas will always
have UCS_BASIC in DTD associated with them. Char columns from user schema will have UCS_BASIC/territory
based depending on what user has requested through 
 COLLATION attribute in the jdbc url at database create time. Char columns that are not associated
with a specific schema will have their DTD marked with collation as described by the rules
in the section "Collation Determination" later on in this page. So, as you can see, collation
information will be saved at the column level in language layer. Store will follow the same
granularity and it will write the collation type for each and every column in it's metadata
(ie for char datatypes as well as non-char datatypes). This collation type will make sense
for only char datatypes. For the other datatypes, collation type will be ignored. 
+ The second proposal(current) was based on the idea that CHAR with territory based collation
differs from the CHAR with default collation in only one aspect and ie how they are collated.
Rest everything is same. So, as long as we know at the collation time, which kind of collation
we are dealing with, we should be fine and hence there is no need to generate new internal
CHAR datatypes. With that proposal, at compile time, when we associate a DataTypeDescriptor
(DTD) with a character string type, we tell what kind of collation should be associated with
that DTD and how was that collation derived "collation derivation". If the collation derivation
is "none", then the collation type should be ignored. Otherwise, the collation type associated
with DTD can be UCS_BASIC/territory base. Char columns associated with SYS schemas will always
have UCS_BASIC in DTD associated with them. Char columns from user schema will have UCS_BASIC/territory
based depending on what user has requeste
 d through COLLATION attribute in the jdbc url at database create time. Char columns that
are not associated with a specific schema will have their DTD marked with collation as described
by the rules in the section "Collation Determination" later on in this page. So, as you can
see, collation information will be saved at the column level in language layer. Store will
follow the same granularity and it will write the collation type for each and every column
in it's metadata (ie for char datatypes as well as non-char datatypes). This collation type
will make sense for only char datatypes. For the other datatypes, collation type will be ignored.

   
  Some of the complexity is coming from the fact that a single database can have 2 different
collations associated with it's columns, ie,  SYS schema will always use UCS_BASIC for it's
collation. But all the user schemas will use either UCS_BASIC/territory based collation. If
the collation was of only one type for the entire database, the design/implementation would
have been far easier and we could keep collation information at database level rather than
column level. 
  

Mime
View raw message