db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Update of "BuiltInLanguageBasedOrderingDERBY-1478" by MamtaSatoor
Date Mon, 02 Apr 2007 03:11:39 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by MamtaSatoor:
http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478

------------------------------------------------------------------------------
   
  == Outstanding items ==
  Language changes
+ 
  1)The type definition of a data type is described by DTD (DataTypeDescriptor). This DTD
will have two additional attributes called collation type and collation derivation. As per
SQL spec, the collation derivation can hold 3 values, "explicit", "implicit" and "none". In
Derby 10.3, the collation derivation will never be "explicit" because Derby 10.3 does not
support SQL Standard's COLLATE clause. In Derby 10.3, the collation derivation can be "implicit"
or "none". If collation derivation is "none", then it means the collation type can't be determined.
This can happen when an aggregate function is working with operands of different collation
types. If the result of such an aggregate function is character string type, then it's collation
derivation will be "none", ie it can not be determined. Other than this aggregate "none" case,
the collation derivation will always be "implicit" and collation type will be UCS_BASIC/TERRITORY_BASED.
Which one of the 2 collation types is pick
 ed for a character string type is explained in section "Collation Determination".
  
  2)The TypeDescriptor for character columns always has 0 for scale because scale does not
apply to character datatypes. Starting Derby 10.3, this scale field in TypeDescriptor will
be overloaded to indicate the collate type of the character. So, if user has requested for
TERRITORY_BASED collation, then the scale in TypeDescriptor for user columns(character) will
be 1(TERRITORY_BASED). The scale will be always 0(UCS_BASIC) for SYS schema character columns
and for databases with collation set to UCS_BASIC. 
@@ -72, +73 @@

  
  4)Store column level metadata for collate in Language Layer as well. This will happen in
DataTypeDescriptor(DTD) with the addition of int collateType field. It will be set to 0(UCS_BASIC)/1(TERRITORY_BASED)/-1(UNKNOWN).
There will be get and set methods on DTD for this new field.
  
+ 5)WorkHorseForCollatorDatatypes should override all the collation related methods so that
it uses the non-default Collator. All the non-default-collation-sensitive classes have an
instance of WorkHorseForCollatorDatatypes which is used to call the collation related methods.
This ensures that these collation related methods are implemented in one central place rather
than in all the collation-sensitive classes. 
+ 
+ 
  Store changes
+ 
  1)Store column level metadata for collate in Store. Store keeps a version number that describes
the strucutre of column level metadata. For existing pre-10.3 databases which get upgraded
to 10.3 and for new 10.3 databases with default collatoin(UCS_BASIC), the structure of column
level metadata will remain same as 10.2 structure of column level metadata, ie they will not
include collate information in their store metadata. A new version would be used in Store
for structure of column level metadata if the newly created 10.3 database has asked for territory
based collation. In other words, information about collate will be kept in Store column level
metadata only if we are working with a 10.3 newly created database with territory based collation.
This approach will make sure that we do not have to do an on-disk store metadata upgrade when
upgrading a pre-10.3 database to 10.3 version.
  
- 7)Currently, store uses Monitor to create DVD template rows. The logic of creating DVDs
using formatids should be factored out from Monitor into DataValueFactory. Talking in terms
of code, RowUtil.newClassInfoTemplate should call DVF.classFromIdentifier rather than Monitor.classFromIdentifier.
+ 2)Currently, store uses Monitor to create DVD template rows. The logic of creating DVDs
using formatids should be factored out from Monitor into DataValueFactory. Talking in terms
of code, RowUtil.newClassInfoTemplate should call DVF.classFromIdentifier rather than Monitor.classFromIdentifier.
  
- 8)This item is related to item 10. With Derby 10.3, collation type will be the additional
metadata in store for each column. When store will call DVF to create DVD template row, it
will pass the formatids and the collation types. DVF will need to be able to assoicate the
correct Collator with the DVD for Char datatypes depending on the collation type. And in order
to find the correct Collator, DVF needs to know the locale of the database. This locale information
will be set on DVF using a new method on DVF called void setLocale(Locale). This call will
be made by BasicDatabase after DVF has finished booting and before store starts booting.
+ 3)This item is related to item 2. With Derby 10.3, collation type will be the additional
metadata in store for each column. When store will call DVF to create DVD template row, it
will pass the formatids and the collation types. DVF will need to be able to assoicate the
correct Collator with the DVD for Char datatypes depending on the collation type. And in order
to find the correct Collator, DVF needs to know the locale of the database. This locale information
will be set on DVF using a new method on DVF called void setLocale(Locale). This call will
be made by BasicDatabase after DVF has finished booting and before store starts booting.
  
- 9)This item is related to item 11. When DVF gets called by store to create right DVD for
given formatid and collation type, for formatids associated with character datatypes, it will
first create the base character datatype class which is say SQLChar. Then it will call getValue
method on the DVD with the RuleBasedCollator corresponding to the collation type as the parameter.
(This RuleBasedCollator will be null for UCS_BASIC collation). The getValue method will return
SQLChar or CollatorSQLChar depending on whether RuleBasedCollator is null or not. getValue
is the new method which needs to be added to the interface StringDataValue.
+ 4)This item is related to item 3. When DVF gets called by store to create right DVD for
given formatid and collation type, for formatids associated with character datatypes, it will
first create the base character datatype class which is say SQLChar. Then it will call getValue
method on the DVD with the RuleBasedCollator corresponding to the collation type as the parameter.
(This RuleBasedCollator will be null for UCS_BASIC collation). The getValue method will return
SQLChar or CollatorSQLChar depending on whether RuleBasedCollator is null or not. getValue
is the new method which needs to be added to the interface StringDataValue.
  
- 10)WorkHorseForCollatorDatatypes should override all the collation related methods so that
it uses the non-default Collator. All the non-default-collation-sensitive classes have an
instance of WorkHorseForCollatorDatatypes which is used to call the collation related methods.
This ensures that these collation related methods are implemented in one central place rather
than in all the collation-sensitive classes. 
+ Performance items
  
- 11)CollatorSQLChar has a method called getCollationElementsForString which currently gets
called by like method. getCollationElementsForString gets the collation elements for the value
of CollatorSQLChar class. But say like method is looking for pattern 'A%' and the value of
CollatorSQLChar is 'BXXXXXXXXXXXXXXXXXXXXXXX'. This is eg of one case where it would have
been better to get collation element one character of CollatorSQLChar value at a time so we
don't go through the process of getting collation elements for the entire string when we don't
really need. This is a performance issue and could be taken up at the end of the implementation.
Comments on this from Dan and Dag can be found in DERBY-2416. 
+ 1)CollatorSQLChar has a method called getCollationElementsForString which currently gets
called by like method. getCollationElementsForString gets the collation elements for the value
of CollatorSQLChar class. But say like method is looking for pattern 'A%' and the value of
CollatorSQLChar is 'BXXXXXXXXXXXXXXXXXXXXXXX'. This is eg of one case where it would have
been better to get collation element one character of CollatorSQLChar value at a time so we
don't go through the process of getting collation elements for the entire string when we don't
really need. This is a performance issue and could be taken up at the end of the implementation.
Comments on this from Dan and Dag can be found in DERBY-2416. 
  
  12)Add tests for this feature. This a broad umbrella task but I do want to mention 3 specific
tests that we should be testing
  

Mime
View raw message