db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta A. Satoor (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-1478) Add built in language based ordering and like processing to Derby
Date Tue, 06 Feb 2007 15:02:06 GMT

    [ https://issues.apache.org/jira/browse/DERBY-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470603

Mamta A. Satoor commented on DERBY-1478:

Rick, I looked at SQL specification(Part 2) regarding SQL identifiers. For background, some
general information on SQL identifiers from SQL spec if as follows
<Start of contents from SQL spec>
1)As per SQL specification Part 2, Section 4.2.4, the character repertoire for sql identifiers,
SQL_IDENTIFIER, consists of  <SQL language character> Latin characters and digits,and
all the other characters that the SQL-implementation supports for use in <regular identifier>.
After this, everything else related to SQL_IDENTIFER character repertoire is defined as implementation-defined.
To be specific, 
2)Section 4.2.5, Character encoding form, Pg 22 says SQL_IDENTIFIER is an implementation-defined
character encoding form. It is applicable to the SQL_IDENTIFIER character repertoire.
3)Section 4.2.6, Collation, Pg 23, says SQL_IDENTIFIER is an implementation-defined collation.
It is applicable to the SQL_IDENTIFIER character repertoire.
4)And lastly, in Section 4.2.7, Character Sets, SQL_IDENTIFIER is a character set whose repertoire
is SQL_IDENTIFIER and whose character encoding form is SQL_IDENTIFIER. The name of its default
collation is SQL_IDENTIFIER.
5)Section, Pg 19, talks about case folding. <fold> is a pair of funtions for
converting all the lower case and title case characters in a given string to upper case or
all the upper case and title case characters to lower case. A lower case character is a character
in the Unicode General Category class "Ll" and upper case character is a character in the
Unicode General Category class "Lu".
<End of contents from SQL spec>

>From the information above, we see that SQL specification leaves CEF and collation for
SQL identifiers as implementation-defined but I donot see it saying specifically that case
folding as implementation-defined. Even the section, Pg 19, second paragraph, talks
about converting case in a generic manner in the context of UNICODE and not English locale.

So, I am not sure why Derby/Cloudscape chose to use English locale to do case conversion of
SQL identifiers. Derby's StringUtil class, where the SQL case conversion code lies, has following
	// The functions below are used for uppercasing SQL in a consistent manner.
	// Cloudscape will uppercase Turkish to the English locale to avoid i
	// uppercasing to an uppercase dotted i. In future versions, all 
	// casing will be done in English.   The result will be that we will get
	// only the 1:1 mappings  in 
	// http://www.unicode.org/Public/3.0-Update1/UnicodeData-3.0.1.txt
	// and avoid the 1:n mappings in 
	// Any SQL casing should use these functions

Dan, you mentioned in one of your comments to this Jira entry that "Currently the uppercasing
of SQL statements and identifiers is fixed as English to avoid unexpected issue with other
languages". Can you please explaing what you mean by unexpected issues? Is that the same reason
for recommending same behavior for system tables?

> Add built in language based ordering and like processing to Derby
> -----------------------------------------------------------------
>                 Key: DERBY-1478
>                 URL: https://issues.apache.org/jira/browse/DERBY-1478
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions:
>            Reporter: Kathey Marsden
>         Assigned To: Mamta A. Satoor
>         Attachments: DERBY-1478_FunctionalSpecV1.html
> It would be good for Derby to have built in Language based ordering based on locale specific
> Language based ordering is an important feature for international deployment.  DERBY-533
offers one implementation option for this but according to the discussion in that issue National
Character Types carry a fair amount of baggage with them especially in the form of concerns
about conversion   to and from datetime and number types. Rick  mentioned SQL language for
collations as an option for language based ordering. There may be other options too, but I
thought it worthwhile to add an issue for the high level functional concern, so the best choice
can be made for implementation without assuming that National Character Types is the only
> For possible 10.1 workaround and examples see:
> http://wiki.apache.org/db-derby/LanguageBasedOrdering

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message