db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Update of "BuiltInLanguageBasedOrderingDERBY-1478" by MamtaSatoor
Date Sat, 31 Mar 2007 23:25:18 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by MamtaSatoor:
http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478

------------------------------------------------------------------------------
  
  
  == Outstanding items ==
- 1)At the time of upgrade of pre-10.3 database, we should make sure that derby.database.collation
property with value UCS_BASIC in added to properties conglomerate. This is because we do not
plan on supporting collation change for existing databases. All of the pre-10.3 databases
will continue to use their old collation (UCS_BASIC) after upgrade to 10.3 release.
+ 1)Store column level metadata for collate in Store. Store keeps a version number that describes
the strucutre of column level metadata. For existing pre-10.3 databases which get upgraded
to 10.3 and for new 10.3 databases with default collatoin(UCS_BASIC), the structure of column
level metadata will remain same as 10.2 structure of column level metadata, ie they will not
include collate information in their store metadata. A new version would be used in Store
for structure of column level metadata if the newly created 10.3 database has asked for territory
based collation. In other words, information about collate will be kept in Store column level
metadata only if we are working with a 10.3 newly created database with territory based collation.
This approach will make sure that we do not have to do an on-disk store metadata upgrade when
upgrading a pre-10.3 database to 10.3 version.
  
- 2)Store column level metadata for collate in Store. Store keeps a version number that describes
the strucutre of column level metadata. For existing pre-10.3 databases which get upgraded
to 10.3 and for new 10.3 databases with default collatoin(UCS_BASIC), the structure of column
level metadata will remain same as 10.2 structure of column level metadata, ie they will not
include collate information in their store metadata. A new version would be used in Store
for structure of column level metadata if the newly created 10.3 database has asked for territory
based collation. In other words, information about collate will be kept in Store column level
metadata only if we are working with a 10.3 newly created database with territory based collation.
This approach will make sure that we do not have to do an on-disk metadata upgrade when upgrading
a pre-10.3 database to 10.3 version.
+ 2)Store column level metadata for collate in Language Layer as well. This will happen in
DataTypeDescriptor(DTD) with the addition of int collateType field. It will be set to 0(UCS_BASIC)/1(TERRITORY_BASED)/-1(UNKNOWN).
There will be get and set methods on DTD for this new field.
  
- 3)Store column level metadata for collate in Language Layer as well. This will happen in
DataTypeDescriptor(DTD) with the addition of int collateType field. It will be set to 0(UCS_BASIC)/1(TERRITORY_BASED)/-1(UNKNOWN).
There will be get and set methods on DTD for this new field.
+ 3)The TypeDescriptor for character columns always has 0 for scale because scale does not
apply to character datatypes. Starting Derby 10.3, this scale field in TypeDescriptor will
be overloaded to indicate the collate type of the character. So, if user has requested for
TERRITORY_BASED collation, then the scale in TypeDescriptor for user columns(character) will
be 1(TERRITORY_BASED). The scale will be always 0(UCS_BASIC) for SYS schema character columns
and for databases with collation set to UCS_BASIC. 
  
- 4)The TypeDescriptor for character columns always has 0 for scale because scale does not
apply to character datatypes. Starting Derby 10.3, this scale field in TypeDescriptor will
be overloaded to indicate the collate type of the character. So, if user has requested for
TERRITORY_BASED collation, then the scale in TypeDescriptor for user columns(character) will
be 1(TERRITORY_BASED). The scale will be always 0(UCS_BASIC) for SYS schema character columns
and for databases with collation set to UCS_BASIC. 
+ 4)The type definition of a column is described by DTD (DataTypeDescriptor). This DTD will
have an additional attribute called collation type. The correct assoication of collation to
the DTD for system or user columns is easy and it will happen at bind time. But there are
other character expressions who are either string literals, or result of cast, trim, upper,
lower, substring, concatenate etc. Determining their collation type requires special handling.
  
- 5)The type definition of a column is described by DTD (DataTypeDescriptor). This DTD will
have an additional attribute called collation type. The correct assoication of collation to
the DTD for system or user columns is easy and it will happen at bind time. But there are
other character expressions who are either string literals, or result of cast, trim, upper,
lower, substring, concatenate etc. Determining their collation type requires special handling.
- 
- 6)For a string literal which is not inside an operation like upper/lower/substring etc,
it's collation type in DTD will be marked UNKNOWN. When such a string literal gets used in
a collation method, it's collation type will be same as the other operand involved in the
collation eg sysColumn1 < 'aaa', then the collation type of 'aaa' will change from UNKNOWN
to UCS_BASIC at the comparison time. But if the comparison was userColumn1 < 'aaa', then
the collation type of 'aaa' will be that of the collaiton type of userColumn1. As a third
case, if the comparison was between 2 string literals, ie 'aaa' < 'bbb', then the collation
type of each of the string literal will be the COLLATION applicable at the user character
level.
+ 5)For a string literal which is not inside an operation like upper/lower/substring etc,
it's collation type in DTD will be marked UNKNOWN. When such a string literal gets used in
a collation method, it's collation type will be same as the other operand involved in the
collation eg sysColumn1 < 'aaa', then the collation type of 'aaa' will change from UNKNOWN
to UCS_BASIC at the comparison time. But if the comparison was userColumn1 < 'aaa', then
the collation type of 'aaa' will be that of the collaiton type of userColumn1. As a third
case, if the comparison was between 2 string literals, ie 'aaa' < 'bbb', then the collation
type of each of the string literal will be the COLLATION applicable at the user character
level.
  
     '''Question''' Does this match the SQL standard?
  
- 7)As for the character expressions involving CAST, TRIM, UPPER, LOWER, SUBSTRING, CONCATENATE,
the result character datatype will have the same collation type as their operands. 
+ 6)As for the character expressions involving CAST, TRIM, UPPER, LOWER, SUBSTRING, CONCATENATE,
the result character datatype will have the same collation type as their operands. 
  
     '''Questions''' What about other character expressions, such as functions? What happens
when CONCATENATE is passed two values with different collations?
  
- 8)When a character column is added using CREATE TABLE/ALTER TABLE, make sure that the correct
collate type is populated in the TypeDescriptor's scale field in the SYS.SYSCOLUMNS table.
+ 7)When a character column is added using CREATE TABLE/ALTER TABLE, make sure that the correct
collate type is populated in the TypeDescriptor's scale field in the SYS.SYSCOLUMNS table.
  
- 9)For both a newly created 10.3 database and an upgraded 10.3 database, make sure that the
scale for character datatypes continue to be 0 (rather than the collation type value) through
the metadata. The overloading of scale in TypeDescriptor as collation for character datatypes
should be transparent to the end user. We should include test for the scale of character datatype.
+ 8)For both a newly created 10.3 database and an upgraded 10.3 database, make sure that the
scale for character datatypes continue to be 0 (rather than the collation type value) through
the metadata. The overloading of scale in TypeDescriptor as collation for character datatypes
should be transparent to the end user. We should include test for the scale of character datatype.
  
- 10)Currently, store uses Monitor to create DVD template rows. The logic of creating DVDs
using formatids should be factored out from Monitor into DataValueFactory. Talking in terms
of code, RowUtil.newClassInfoTemplate should call DVF.classFromIdentifier rather than Monitor.classFromIdentifier.
+ 9)Currently, store uses Monitor to create DVD template rows. The logic of creating DVDs
using formatids should be factored out from Monitor into DataValueFactory. Talking in terms
of code, RowUtil.newClassInfoTemplate should call DVF.classFromIdentifier rather than Monitor.classFromIdentifier.
  
- 11)This item is related to item 10. With Derby 10.3, collation type will be the additional
metadata in store for each column. When store will call DVF to create DVD template row, it
will pass the formatids and the collation types. DVF will need to be able to assoicate the
correct Collator with the DVD for Char datatypes depending on the collation type. And in order
to find the correct Collator, DVF needs to know the locale of the database. This locale information
will be set on DVF using a new method on DVF called void setLocale(Locale). This call will
be made by BasicDatabase after DVF has finished booting and before store starts booting.
+ 10)This item is related to item 10. With Derby 10.3, collation type will be the additional
metadata in store for each column. When store will call DVF to create DVD template row, it
will pass the formatids and the collation types. DVF will need to be able to assoicate the
correct Collator with the DVD for Char datatypes depending on the collation type. And in order
to find the correct Collator, DVF needs to know the locale of the database. This locale information
will be set on DVF using a new method on DVF called void setLocale(Locale). This call will
be made by BasicDatabase after DVF has finished booting and before store starts booting.
  
- 12)This item is related to item 11. When DVF gets called by store to create right DVD for
given formatid and collation type, for formatids associated with character datatypes, it will
first create the base character datatype class which is say SQLChar. Then it will call getValue
method on the DVD with the RuleBasedCollator corresponding to the collation type as the parameter.
(This RuleBasedCollator will be null for UCS_BASIC collation). The getValue method will return
SQLChar or CollatorSQLChar depending on whether RuleBasedCollator is null or not. getValue
is the new method which needs to be added to the interface StringDataValue.
+ 11)This item is related to item 11. When DVF gets called by store to create right DVD for
given formatid and collation type, for formatids associated with character datatypes, it will
first create the base character datatype class which is say SQLChar. Then it will call getValue
method on the DVD with the RuleBasedCollator corresponding to the collation type as the parameter.
(This RuleBasedCollator will be null for UCS_BASIC collation). The getValue method will return
SQLChar or CollatorSQLChar depending on whether RuleBasedCollator is null or not. getValue
is the new method which needs to be added to the interface StringDataValue.
  
  12)Override all the collation related methods in the CollatorSQLChar. CollatorSQLChar is
a subclass of SQLChar.
  

Mime
View raw message