db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Update of "BuiltInLanguageBasedOrderingDERBY-1478" by MamtaSatoor
Date Mon, 02 Apr 2007 19:45:57 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by MamtaSatoor:
http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478

------------------------------------------------------------------------------
  == Outstanding items ==
  Language changes
  
- 1)The type definition of a data type is described by DTD (DataTypeDescriptor). This DTD
will have two additional attributes called collation type and collation derivation. As per
SQL spec, the collation derivation can hold 3 values, "explicit", "implicit" and "none". In
Derby 10.3, the collation derivation will never be "explicit" because Derby 10.3 does not
support SQL Standard's COLLATE clause. In Derby 10.3, the collation derivation can be "implicit"
or "none". If collation derivation is "none", then it means the collation type can't be determined.
This can happen when an aggregate function is working with operands of different collation
types. If the result of such an aggregate function is character string type, then it's collation
derivation will be "none", ie it can not be determined. Other than this aggregate "none" case,
the collation derivation will always be "implicit" and collation type will be UCS_BASIC/TERRITORY_BASED.
Which one of the 2 collation types is pick
 ed for a character string type is explained in section "Collation Determination".
+ 1)The type definition of a data type is described by DTD (DataTypeDescriptor). This DTD
will have two additional attributes called collation type and collation derivation. As per
SQL spec, the collation derivation can hold 3 values, "explicit", "implicit" and "none". In
Derby 10.3, the collation derivation will never be "explicit" because Derby 10.3 does not
support SQL Standard's COLLATE clause. In Derby 10.3, the collation derivation can be "implicit"
or "none". If collation derivation is "none", then it means the collation type can't be determined.
This can happen when an aggregate function is working with operands of different collation
types. If the result of such an aggregate function is character string type, then it's collation
derivation will be "none", ie it can not be determined. Other than this aggregate "none" case,
the collation derivation will always be "implicit" and collation type will be UCS_BASIC/TERRITORY_BASED.
Which one of the 2 collation types is pick
 ed for a character string type is explained in detail in section "Collation Determination".
  
- 2)The TypeDescriptor for character columns always has 0 for scale because scale does not
apply to character datatypes. Starting Derby 10.3, this scale field in TypeDescriptor will
be overloaded to indicate the collate type of the character. So, if user has requested for
TERRITORY_BASED collation, then the scale in TypeDescriptor for user columns(character) will
be 1(TERRITORY_BASED). The scale will be always 0(UCS_BASIC) for SYS schema character columns
and for databases with collation set to UCS_BASIC. These changes will gon in readExternal
and writeExternal methods of TypeDescriptor.
+ 2)The TypeDescriptor for character columns always has 0 for scale because scale does not
apply to character datatypes. Starting Derby 10.3, this scale field in TypeDescriptor will
be overloaded to indicate the collate type of the character. So, if user has requested for
TERRITORY_BASED collation, then the scale in TypeDescriptor for user columns(character) will
be 1(TERRITORY_BASED). The scale will be always 0(UCS_BASIC) for SYS schema character columns
and for databases with collation set to UCS_BASIC. These changes will go in readExternal and
writeExternal methods of TypeDescriptor. Using the value 0 for UCS_BASIC will ensure that
pre-10.3 databases with scale field as 0 in TypeDescriptor will continue to use UCS_BASIC
after upgrade to 10.3, because 0 in scale field corresponds to UCS_BASIC collation type.
  
  3)When a character column is added using CREATE TABLE/ALTER TABLE, make sure that the correct
collate type is populated in the TypeDescriptor's scale field in the SYS.SYSCOLUMNS table.
  
@@ -87, +87 @@

  
  4)This item is related to item 3. When DVF gets called by store to create right DVD for
given formatid and collation type, for formatids associated with character datatypes, it will
first create the base character datatype class which is say SQLChar. Then it will call getValue
method on the DVD with the RuleBasedCollator corresponding to the collation type as the parameter.
(This RuleBasedCollator will be null for UCS_BASIC collation). The getValue method will return
SQLChar or CollatorSQLChar depending on whether RuleBasedCollator is null or not. getValue
is the new method which needs to be added to the interface StringDataValue.
  
- Performance items
- 
- 1)CollatorSQLChar has a method called getCollationElementsForString which currently gets
called by like method. getCollationElementsForString gets the collation elements for the value
of CollatorSQLChar class. But say like method is looking for pattern 'A%' and the value of
CollatorSQLChar is 'BXXXXXXXXXXXXXXXXXXXXXXX'. This is eg of one case where it would have
been better to get collation element one character of CollatorSQLChar value at a time so we
don't go through the process of getting collation elements for the entire string when we don't
really need. This is a performance issue and could be taken up at the end of the implementation.
Comments on this from Dan and Dag can be found in DERBY-2416. 
- 
  Testing
  
  1)Add tests for this feature. This a broad umbrella task but I do want to mention 3 specific
tests that we should be testing
@@ -102, +98 @@

  c)CREATE VIEW should have collation type UCS_BASIC/TERRIOTRY_BASED assocatied with it's
character columns. The exact collation will be determined by what is the value of the COLLATION
attribute. This is same as what would happen for CREATE TABLE. Have a test for global temporary
tables with character colums too.
  
  Metadata query changes
+ 
  1)Fix the metadata queries so that string literals can be compared with character columns
from system schema. String literals will have the USER character set associated with them
and system character columns will have SQL_IDENTIFIER associated with them. In order to compare
them, both the sides of the comparison should come from same character set. Following casting
of string literal will achieve that. syscharcol = CAST(string_literal as varchar(128)). 
  
  Network Server
+ 
  1)At this point, I am not sure what kind of work (if any) will be involved for Network Server.
+ 
+ Performance items
+ 
+ 1)CollatorSQLChar has a method called getCollationElementsForString which currently gets
called by like method. getCollationElementsForString gets the collation elements for the value
of CollatorSQLChar class. But say like method is looking for pattern 'A%' and the value of
CollatorSQLChar is 'BXXXXXXXXXXXXXXXXXXXXXXX'. This is eg of one case where it would have
been better to get collation element one character of CollatorSQLChar value at a time so we
don't go through the process of getting collation elements for the entire string when we don't
really need. This is a performance issue and could be taken up at the end of the implementation.
Comments on this from Dan and Dag can be found in DERBY-2416. 
  
  Miscellaneous item
  1)Make sure the space padding at the end of various character datatypes is implemented commented
correctly in javadocs. This padding is used in collation related methods. For eg check SQLChar.stringCompare
method.

Mime
View raw message