Return-Path: Delivered-To: apmail-db-derby-commits-archive@www.apache.org Received: (qmail 84584 invoked from network); 2 Apr 2007 19:46:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Apr 2007 19:46:18 -0000 Received: (qmail 86707 invoked by uid 500); 2 Apr 2007 19:46:25 -0000 Delivered-To: apmail-db-derby-commits-archive@db.apache.org Received: (qmail 86675 invoked by uid 500); 2 Apr 2007 19:46:25 -0000 Mailing-List: contact derby-commits-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: "Derby Development" List-Id: Delivered-To: mailing list derby-commits@db.apache.org Received: (qmail 86658 invoked by uid 99); 2 Apr 2007 19:46:25 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Apr 2007 12:46:25 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Apr 2007 12:46:17 -0700 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 8A4B159A07 for ; Mon, 2 Apr 2007 19:45:57 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: derby-commits@db.apache.org Date: Mon, 02 Apr 2007 19:45:57 -0000 Message-ID: <20070402194557.14461.96306@eos.apache.org> Subject: [Db-derby Wiki] Update of "BuiltInLanguageBasedOrderingDERBY-1478" by MamtaSatoor X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification. The following page has been changed by MamtaSatoor: http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478 ------------------------------------------------------------------------------ == Outstanding items == Language changes - 1)The type definition of a data type is described by DTD (DataTypeDescriptor). This DTD will have two additional attributes called collation type and collation derivation. As per SQL spec, the collation derivation can hold 3 values, "explicit", "implicit" and "none". In Derby 10.3, the collation derivation will never be "explicit" because Derby 10.3 does not support SQL Standard's COLLATE clause. In Derby 10.3, the collation derivation can be "implicit" or "none". If collation derivation is "none", then it means the collation type can't be determined. This can happen when an aggregate function is working with operands of different collation types. If the result of such an aggregate function is character string type, then it's collation derivation will be "none", ie it can not be determined. Other than this aggregate "none" case, the collation derivation will always be "implicit" and collation type will be UCS_BASIC/TERRITORY_BASED. Which one of the 2 collation types is pick ed for a character string type is explained in section "Collation Determination". + 1)The type definition of a data type is described by DTD (DataTypeDescriptor). This DTD will have two additional attributes called collation type and collation derivation. As per SQL spec, the collation derivation can hold 3 values, "explicit", "implicit" and "none". In Derby 10.3, the collation derivation will never be "explicit" because Derby 10.3 does not support SQL Standard's COLLATE clause. In Derby 10.3, the collation derivation can be "implicit" or "none". If collation derivation is "none", then it means the collation type can't be determined. This can happen when an aggregate function is working with operands of different collation types. If the result of such an aggregate function is character string type, then it's collation derivation will be "none", ie it can not be determined. Other than this aggregate "none" case, the collation derivation will always be "implicit" and collation type will be UCS_BASIC/TERRITORY_BASED. Which one of the 2 collation types is pick ed for a character string type is explained in detail in section "Collation Determination". - 2)The TypeDescriptor for character columns always has 0 for scale because scale does not apply to character datatypes. Starting Derby 10.3, this scale field in TypeDescriptor will be overloaded to indicate the collate type of the character. So, if user has requested for TERRITORY_BASED collation, then the scale in TypeDescriptor for user columns(character) will be 1(TERRITORY_BASED). The scale will be always 0(UCS_BASIC) for SYS schema character columns and for databases with collation set to UCS_BASIC. These changes will gon in readExternal and writeExternal methods of TypeDescriptor. + 2)The TypeDescriptor for character columns always has 0 for scale because scale does not apply to character datatypes. Starting Derby 10.3, this scale field in TypeDescriptor will be overloaded to indicate the collate type of the character. So, if user has requested for TERRITORY_BASED collation, then the scale in TypeDescriptor for user columns(character) will be 1(TERRITORY_BASED). The scale will be always 0(UCS_BASIC) for SYS schema character columns and for databases with collation set to UCS_BASIC. These changes will go in readExternal and writeExternal methods of TypeDescriptor. Using the value 0 for UCS_BASIC will ensure that pre-10.3 databases with scale field as 0 in TypeDescriptor will continue to use UCS_BASIC after upgrade to 10.3, because 0 in scale field corresponds to UCS_BASIC collation type. 3)When a character column is added using CREATE TABLE/ALTER TABLE, make sure that the correct collate type is populated in the TypeDescriptor's scale field in the SYS.SYSCOLUMNS table. @@ -87, +87 @@ 4)This item is related to item 3. When DVF gets called by store to create right DVD for given formatid and collation type, for formatids associated with character datatypes, it will first create the base character datatype class which is say SQLChar. Then it will call getValue method on the DVD with the RuleBasedCollator corresponding to the collation type as the parameter. (This RuleBasedCollator will be null for UCS_BASIC collation). The getValue method will return SQLChar or CollatorSQLChar depending on whether RuleBasedCollator is null or not. getValue is the new method which needs to be added to the interface StringDataValue. - Performance items - - 1)CollatorSQLChar has a method called getCollationElementsForString which currently gets called by like method. getCollationElementsForString gets the collation elements for the value of CollatorSQLChar class. But say like method is looking for pattern 'A%' and the value of CollatorSQLChar is 'BXXXXXXXXXXXXXXXXXXXXXXX'. This is eg of one case where it would have been better to get collation element one character of CollatorSQLChar value at a time so we don't go through the process of getting collation elements for the entire string when we don't really need. This is a performance issue and could be taken up at the end of the implementation. Comments on this from Dan and Dag can be found in DERBY-2416. - Testing 1)Add tests for this feature. This a broad umbrella task but I do want to mention 3 specific tests that we should be testing @@ -102, +98 @@ c)CREATE VIEW should have collation type UCS_BASIC/TERRIOTRY_BASED assocatied with it's character columns. The exact collation will be determined by what is the value of the COLLATION attribute. This is same as what would happen for CREATE TABLE. Have a test for global temporary tables with character colums too. Metadata query changes + 1)Fix the metadata queries so that string literals can be compared with character columns from system schema. String literals will have the USER character set associated with them and system character columns will have SQL_IDENTIFIER associated with them. In order to compare them, both the sides of the comparison should come from same character set. Following casting of string literal will achieve that. syscharcol = CAST(string_literal as varchar(128)). Network Server + 1)At this point, I am not sure what kind of work (if any) will be involved for Network Server. + + Performance items + + 1)CollatorSQLChar has a method called getCollationElementsForString which currently gets called by like method. getCollationElementsForString gets the collation elements for the value of CollatorSQLChar class. But say like method is looking for pattern 'A%' and the value of CollatorSQLChar is 'BXXXXXXXXXXXXXXXXXXXXXXX'. This is eg of one case where it would have been better to get collation element one character of CollatorSQLChar value at a time so we don't go through the process of getting collation elements for the entire string when we don't really need. This is a performance issue and could be taken up at the end of the implementation. Comments on this from Dan and Dag can be found in DERBY-2416. Miscellaneous item 1)Make sure the space padding at the end of various character datatypes is implemented commented correctly in javadocs. This padding is used in collation related methods. For eg check SQLChar.stringCompare method.