db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Update of "BuiltInLanguageBasedOrderingDERBY-1478" by MamtaSatoor
Date Sat, 31 Mar 2007 23:39:55 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by MamtaSatoor:
http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478

------------------------------------------------------------------------------
  
  2)Store column level metadata for collate in Language Layer as well. This will happen in
DataTypeDescriptor(DTD) with the addition of int collateType field. It will be set to 0(UCS_BASIC)/1(TERRITORY_BASED)/-1(UNKNOWN).
There will be get and set methods on DTD for this new field.
  
- 3)The TypeDescriptor for character columns always has 0 for scale because scale does not
apply to character datatypes. Starting Derby 10.3, this scale field in TypeDescriptor will
be overloaded to indicate the collate type of the character. So, if user has requested for
TERRITORY_BASED collation, then the scale in TypeDescriptor for user columns(character) will
be 1(TERRITORY_BASED). The scale will be always 0(UCS_BASIC) for SYS schema character columns
and for databases with collation set to UCS_BASIC. 
+ 3)The type definition of a data type is described by DTD (DataTypeDescriptor). This DTD
will have two additional attributes called collation type and collation derivation. As per
SQL spec, the collation derivation can hold 3 values, "explicit", "implicit" and "none". In
Derby 10.3, the collation derivation will never be "explicit" because Derby 10.3 does not
support SQL Standard's COLLATE clause. In Derby 10.3, the collation derivation can be "implicit"
or "none". If collation derivation is "none", then it means the collation type can't be determined.
This can happen when an aggregate function is working with operands of different collation
types. If the result of such an aggregate function is character string type, then it's collation
derivation will be "none", ie it can not be determined. Other than this aggregate "none" case,
the collation derivation will always be "implicit" and collation type will be UCS_BASIC/TERRITORY_BASED.
Which one of the 2 collation types is pick
 ed for a character string type is explained in section "Collation Determination".
  
- 4)The type definition of a column is described by DTD (DataTypeDescriptor). This DTD will
have an additional attribute called collation type. The correct assoication of collation to
the DTD for system or user columns is easy and it will happen at bind time. But there are
other character expressions who are either string literals, or result of cast, trim, upper,
lower, substring, concatenate etc. Determining their collation type requires special handling.
+ 4)The TypeDescriptor for character columns always has 0 for scale because scale does not
apply to character datatypes. Starting Derby 10.3, this scale field in TypeDescriptor will
be overloaded to indicate the collate type of the character. So, if user has requested for
TERRITORY_BASED collation, then the scale in TypeDescriptor for user columns(character) will
be 1(TERRITORY_BASED). The scale will be always 0(UCS_BASIC) for SYS schema character columns
and for databases with collation set to UCS_BASIC. 
  
- 5)For a string literal which is not inside an operation like upper/lower/substring etc,
it's collation type in DTD will be marked UNKNOWN. When such a string literal gets used in
a collation method, it's collation type will be same as the other operand involved in the
collation eg sysColumn1 < 'aaa', then the collation type of 'aaa' will change from UNKNOWN
to UCS_BASIC at the comparison time. But if the comparison was userColumn1 < 'aaa', then
the collation type of 'aaa' will be that of the collaiton type of userColumn1. As a third
case, if the comparison was between 2 string literals, ie 'aaa' < 'bbb', then the collation
type of each of the string literal will be the COLLATION applicable at the user character
level.
+ 5)When a character column is added using CREATE TABLE/ALTER TABLE, make sure that the correct
collate type is populated in the TypeDescriptor's scale field in the SYS.SYSCOLUMNS table.
  
-    '''Question''' Does this match the SQL standard?
+ 6)For both a newly created 10.3 database and an upgraded 10.3 database, make sure that the
scale for character datatypes continue to be 0 (rather than the collation type value) through
the metadata. The overloading of scale in TypeDescriptor as collation for character datatypes
should be transparent to the end user. We should include test for the scale of character datatype.
  
- 6)As for the character expressions involving CAST, TRIM, UPPER, LOWER, SUBSTRING, CONCATENATE,
the result character datatype will have the same collation type as their operands. 
+ 7)Currently, store uses Monitor to create DVD template rows. The logic of creating DVDs
using formatids should be factored out from Monitor into DataValueFactory. Talking in terms
of code, RowUtil.newClassInfoTemplate should call DVF.classFromIdentifier rather than Monitor.classFromIdentifier.
  
-    '''Questions''' What about other character expressions, such as functions? What happens
when CONCATENATE is passed two values with different collations?
+ 8)This item is related to item 10. With Derby 10.3, collation type will be the additional
metadata in store for each column. When store will call DVF to create DVD template row, it
will pass the formatids and the collation types. DVF will need to be able to assoicate the
correct Collator with the DVD for Char datatypes depending on the collation type. And in order
to find the correct Collator, DVF needs to know the locale of the database. This locale information
will be set on DVF using a new method on DVF called void setLocale(Locale). This call will
be made by BasicDatabase after DVF has finished booting and before store starts booting.
  
- 7)When a character column is added using CREATE TABLE/ALTER TABLE, make sure that the correct
collate type is populated in the TypeDescriptor's scale field in the SYS.SYSCOLUMNS table.
+ 9)This item is related to item 11. When DVF gets called by store to create right DVD for
given formatid and collation type, for formatids associated with character datatypes, it will
first create the base character datatype class which is say SQLChar. Then it will call getValue
method on the DVD with the RuleBasedCollator corresponding to the collation type as the parameter.
(This RuleBasedCollator will be null for UCS_BASIC collation). The getValue method will return
SQLChar or CollatorSQLChar depending on whether RuleBasedCollator is null or not. getValue
is the new method which needs to be added to the interface StringDataValue.
  
- 8)For both a newly created 10.3 database and an upgraded 10.3 database, make sure that the
scale for character datatypes continue to be 0 (rather than the collation type value) through
the metadata. The overloading of scale in TypeDescriptor as collation for character datatypes
should be transparent to the end user. We should include test for the scale of character datatype.
+ 10)Override all the collation related methods in the CollatorSQLChar. CollatorSQLChar is
a subclass of SQLChar.
  
- 9)Currently, store uses Monitor to create DVD template rows. The logic of creating DVDs
using formatids should be factored out from Monitor into DataValueFactory. Talking in terms
of code, RowUtil.newClassInfoTemplate should call DVF.classFromIdentifier rather than Monitor.classFromIdentifier.
+ 11)Add subclasses for SQLVarchar, SQLLongvarchar, SQLClob. These subclasses will override
the collation related methods in their superclasses.
  
- 10)This item is related to item 10. With Derby 10.3, collation type will be the additional
metadata in store for each column. When store will call DVF to create DVD template row, it
will pass the formatids and the collation types. DVF will need to be able to assoicate the
correct Collator with the DVD for Char datatypes depending on the collation type. And in order
to find the correct Collator, DVF needs to know the locale of the database. This locale information
will be set on DVF using a new method on DVF called void setLocale(Locale). This call will
be made by BasicDatabase after DVF has finished booting and before store starts booting.
+ 12)CollatorSQLChar has a method called getCollationElementsForString which currently gets
called by like method. getCollationElementsForString gets the collation elements for the value
of CollatorSQLChar class. But say like method is looking for pattern 'A%' and the value of
CollatorSQLChar is 'BXXXXXXXXXXXXXXXXXXXXXXX'. This is eg of one case where it would have
been better to get collation element one character of CollatorSQLChar value at a time so we
don't go through the process of getting collation elements for the entire string when we don't
really need. This is a performance issue and could be taken up at the end of the implementation.
Comments on this from Dan and Dag can be found in DERBY-2416. 
  
- 11)This item is related to item 11. When DVF gets called by store to create right DVD for
given formatid and collation type, for formatids associated with character datatypes, it will
first create the base character datatype class which is say SQLChar. Then it will call getValue
method on the DVD with the RuleBasedCollator corresponding to the collation type as the parameter.
(This RuleBasedCollator will be null for UCS_BASIC collation). The getValue method will return
SQLChar or CollatorSQLChar depending on whether RuleBasedCollator is null or not. getValue
is the new method which needs to be added to the interface StringDataValue.
- 
- 12)Override all the collation related methods in the CollatorSQLChar. CollatorSQLChar is
a subclass of SQLChar.
- 
- 13)Add subclasses for SQLVarchar, SQLLongvarchar, SQLClob. These subclasses will override
the collation related methods in their superclasses.
- 
- 14)CollatorSQLChar has a method called getCollationElementsForString which currently gets
called by like method. getCollationElementsForString gets the collation elements for the value
of CollatorSQLChar class. But say like method is looking for pattern 'A%' and the value of
CollatorSQLChar is 'BXXXXXXXXXXXXXXXXXXXXXXX'. This is eg of one case where it would have
been better to get collation element one character of CollatorSQLChar value at a time so we
don't go through the process of getting collation elements for the entire string when we don't
really need. This is a performance issue and could be taken up at the end of the implementation.
Comments on this from Dan and Dag can be found in DERBY-2416. 
- 
- 15)Add tests for this feature. This a broad umbrella task but I do want to mention 3 specific
tests that we should be testing
+ 13)Add tests for this feature. This a broad umbrella task but I do want to mention 3 specific
tests that we should be testing
  a)Make sure the scale of the character datatypes is always 0 and it didn't get impacted
negatively by the overloading of scale field as collation type in TypeDescriptor.
  b)Test case for recovery - have an outstanding transaction with insert/delete/updates that
affect one or more character indexes (all with a collation setting that is different from
default collation). Make sure those log records get to the log and then crash the server.
Restarting the server will then run through the recovery code and will ensure that we test
for correct collation usage at recovery time. Mike has put more info about this in DERBY-2336.
  c)CREATE VIEW should have collation type UCS_BASIC/TERRIOTRY_BASED assocatied with it's
character columns. The exact collation will be determined by what is the value of the COLLATION
attribute. This is same as what would happen for CREATE TABLE. Have a test for global temporary
tables with character colums too.
  
- 16)Make sure the space padding at the end of various character datatypes is implemented
commented correctly in javadocs. This padding is used in collation related methods. For eg
check SQLChar.stringCompare method.
+ 14)Make sure the space padding at the end of various character datatypes is implemented
commented correctly in javadocs. This padding is used in collation related methods. For eg
check SQLChar.stringCompare method.
  
  == Implemented items ==
  1)A shell for subclass of SQLChar has been implemented and it is called CollatorSQLChar.
It resides in derby.iapi.types package. This work was done by revisions 516864, 516869 and
518479.

Mime
View raw message