db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel John Debrunner <...@apache.org>
Subject Re: Collation feature discussion
Date Mon, 26 Mar 2007 20:48:42 GMT
Roy Lyseng wrote:
> Daniel John Debrunner wrote:

>> Thus Derby could have two character sets:
>>   - USER - UCS repertoire with default collation of UCS_BASIC or 
>> UNICODE depending on value of collation JDBC attribute at create 
>> database time
>>  - SYSTEM - UCS repertoire with default collation of UCS_BASIC

> I think that you should carefully consider the implications of using two 
> character sets. Among other things, it means that two strings with 
> different character sets are not immediately comparable. And as far as I 
> know, this applies to literals as well. What this means (I think) is 
> that if columns in system tables are defined with character set SYSTEM, 
>  columns in user-defined tables are defined with character set USER, and 
> literals are of type USER, then you cannot immediately compare literals 
> with the character columns in the system tables.

Note I'm using "character set" as the SQL Standard defines it (section 
4.2.7) and different character sets are comparable if they have a 
collation in common (section 4.2.2).

I think the SQL Standard also mandates multiple character sets if one 
wants different default collations. The expression CURRENT USER has a 
mandated character set of SQL_IDENTIFIER, thus Derby must support that,
and it is required that SQL identifiers have UCS_BASIC collation. Then a 
CREATE TABLE picks up its collation from its default *character set* 
which comes from its schema 11.4 SR10b), so to have a different default 
collation to SQL_IDENTIFIER a different character set is needed.

> Another option is to use one character set, but use different collations 
> for different types of tables. You may define that character columns in 
> system tables are created using collation UCS_BASIC, while all user 
> tables are created with a user-defined collation. Because all columns 
> are defined using the same character set, all columns and literals will 
> be comparable.

Is that correct? So far the discussion has assumed columns with 
different implicit collations are not comparable, see 9.3 SR3e).

I don't think it's a goal to have columns in system tables be comparable 
with user columns since if they have different collations the standard 
says they are not. [assuming no implementation of a <collate clause>]

A goal is to have the SQL queries used for JDBC metadata continue to 
work, which is currently the discussion around literals. The standard 
seems to poorly define the character set of a string literal.

> Just remember that when comparing two strings with different defined 
> collations, you need to consider the collation rules defined by the SQL 
> standard.

Right, I think we are trying to understand those rules, how they apply 
to Derby and the proposed changes for DERBY-1478.


View raw message