db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Update of "BuiltInLanguageBasedOrderingDERBY-1478" by MamtaSatoor
Date Mon, 02 Apr 2007 03:09:01 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by MamtaSatoor:
http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478

------------------------------------------------------------------------------
  12)And last but not least, the collation methods will follow the rules defined by SQL spec
in Section 9.13 Collation determination Syntax Rules 2 and 3e. In other words, at least one
operand shall have a declared type collation (that means if the comparison is sysChar1|userChar1
> sysChar2|userChar2, then such a comparison will fail because both sides of > operator
have collation derivation of none after applying the rules of 10 above) AND every operand
whose collation derivation is implicit shall have the same declared type collation (that means
if the comparison is sysChar1 > userChar1WithTerritoryBasedCollation, then such a comparison
will fail because left operand has collation derivation as implicit and collation type as
UCS_BASIC and the right operand has collation derivation implicit and collation type as TERRITORY_BASED)

   
  == Outstanding items ==
+ Language changes
+ 1)The type definition of a data type is described by DTD (DataTypeDescriptor). This DTD
will have two additional attributes called collation type and collation derivation. As per
SQL spec, the collation derivation can hold 3 values, "explicit", "implicit" and "none". In
Derby 10.3, the collation derivation will never be "explicit" because Derby 10.3 does not
support SQL Standard's COLLATE clause. In Derby 10.3, the collation derivation can be "implicit"
or "none". If collation derivation is "none", then it means the collation type can't be determined.
This can happen when an aggregate function is working with operands of different collation
types. If the result of such an aggregate function is character string type, then it's collation
derivation will be "none", ie it can not be determined. Other than this aggregate "none" case,
the collation derivation will always be "implicit" and collation type will be UCS_BASIC/TERRITORY_BASED.
Which one of the 2 collation types is pick
 ed for a character string type is explained in section "Collation Determination".
+ 
+ 2)The TypeDescriptor for character columns always has 0 for scale because scale does not
apply to character datatypes. Starting Derby 10.3, this scale field in TypeDescriptor will
be overloaded to indicate the collate type of the character. So, if user has requested for
TERRITORY_BASED collation, then the scale in TypeDescriptor for user columns(character) will
be 1(TERRITORY_BASED). The scale will be always 0(UCS_BASIC) for SYS schema character columns
and for databases with collation set to UCS_BASIC. 
+ 
+ 3)When a character column is added using CREATE TABLE/ALTER TABLE, make sure that the correct
collate type is populated in the TypeDescriptor's scale field in the SYS.SYSCOLUMNS table.
+ 
+ 4)Store column level metadata for collate in Language Layer as well. This will happen in
DataTypeDescriptor(DTD) with the addition of int collateType field. It will be set to 0(UCS_BASIC)/1(TERRITORY_BASED)/-1(UNKNOWN).
There will be get and set methods on DTD for this new field.
+ 
+ Store changes
  1)Store column level metadata for collate in Store. Store keeps a version number that describes
the strucutre of column level metadata. For existing pre-10.3 databases which get upgraded
to 10.3 and for new 10.3 databases with default collatoin(UCS_BASIC), the structure of column
level metadata will remain same as 10.2 structure of column level metadata, ie they will not
include collate information in their store metadata. A new version would be used in Store
for structure of column level metadata if the newly created 10.3 database has asked for territory
based collation. In other words, information about collate will be kept in Store column level
metadata only if we are working with a 10.3 newly created database with territory based collation.
This approach will make sure that we do not have to do an on-disk store metadata upgrade when
upgrading a pre-10.3 database to 10.3 version.
- 
- 2)Store column level metadata for collate in Language Layer as well. This will happen in
DataTypeDescriptor(DTD) with the addition of int collateType field. It will be set to 0(UCS_BASIC)/1(TERRITORY_BASED)/-1(UNKNOWN).
There will be get and set methods on DTD for this new field.
- 
- 3)The type definition of a data type is described by DTD (DataTypeDescriptor). This DTD
will have two additional attributes called collation type and collation derivation. As per
SQL spec, the collation derivation can hold 3 values, "explicit", "implicit" and "none". In
Derby 10.3, the collation derivation will never be "explicit" because Derby 10.3 does not
support SQL Standard's COLLATE clause. In Derby 10.3, the collation derivation can be "implicit"
or "none". If collation derivation is "none", then it means the collation type can't be determined.
This can happen when an aggregate function is working with operands of different collation
types. If the result of such an aggregate function is character string type, then it's collation
derivation will be "none", ie it can not be determined. Other than this aggregate "none" case,
the collation derivation will always be "implicit" and collation type will be UCS_BASIC/TERRITORY_BASED.
Which one of the 2 collation types is pick
 ed for a character string type is explained in section "Collation Determination".
- 
- 4)The TypeDescriptor for character columns always has 0 for scale because scale does not
apply to character datatypes. Starting Derby 10.3, this scale field in TypeDescriptor will
be overloaded to indicate the collate type of the character. So, if user has requested for
TERRITORY_BASED collation, then the scale in TypeDescriptor for user columns(character) will
be 1(TERRITORY_BASED). The scale will be always 0(UCS_BASIC) for SYS schema character columns
and for databases with collation set to UCS_BASIC. 
- 
- 5)When a character column is added using CREATE TABLE/ALTER TABLE, make sure that the correct
collate type is populated in the TypeDescriptor's scale field in the SYS.SYSCOLUMNS table.
- 
- 6)For both a newly created 10.3 database and an upgraded 10.3 database, make sure that the
scale for character datatypes continue to be 0 (rather than the collation type value) through
the metadata. The overloading of scale in TypeDescriptor as collation for character datatypes
should be transparent to the end user. We should include test for the scale of character datatype.
  
  7)Currently, store uses Monitor to create DVD template rows. The logic of creating DVDs
using formatids should be factored out from Monitor into DataValueFactory. Talking in terms
of code, RowUtil.newClassInfoTemplate should call DVF.classFromIdentifier rather than Monitor.classFromIdentifier.
  
@@ -86, +86 @@

  11)CollatorSQLChar has a method called getCollationElementsForString which currently gets
called by like method. getCollationElementsForString gets the collation elements for the value
of CollatorSQLChar class. But say like method is looking for pattern 'A%' and the value of
CollatorSQLChar is 'BXXXXXXXXXXXXXXXXXXXXXXX'. This is eg of one case where it would have
been better to get collation element one character of CollatorSQLChar value at a time so we
don't go through the process of getting collation elements for the entire string when we don't
really need. This is a performance issue and could be taken up at the end of the implementation.
Comments on this from Dan and Dag can be found in DERBY-2416. 
  
  12)Add tests for this feature. This a broad umbrella task but I do want to mention 3 specific
tests that we should be testing
- a)Make sure the scale of the character datatypes is always 0 and it didn't get impacted
negatively by the overloading of scale field as collation type in TypeDescriptor.
+ 
+ a)For both a newly created 10.3 database and an upgraded 10.3 database, make sure that metadata
continues to show the scale for character datatypes as 0 (rather than the collation type value).
That is, test that the scale of the character datatypes is always 0 and it didn't get impacted
negatively by the overloading of scale field as collation type in TypeDescriptor.
+ 
  b)Test case for recovery - have an outstanding transaction with insert/delete/updates that
affect one or more character indexes (all with a collation setting that is different from
default collation). Make sure those log records get to the log and then crash the server.
Restarting the server will then run through the recovery code and will ensure that we test
for correct collation usage at recovery time. Mike has put more info about this in DERBY-2336.
+ 
  c)CREATE VIEW should have collation type UCS_BASIC/TERRIOTRY_BASED assocatied with it's
character columns. The exact collation will be determined by what is the value of the COLLATION
attribute. This is same as what would happen for CREATE TABLE. Have a test for global temporary
tables with character colums too.
  
  13)Make sure the space padding at the end of various character datatypes is implemented
commented correctly in javadocs. This padding is used in collation related methods. For eg
check SQLChar.stringCompare method.

Mime
View raw message