db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Re: Collation feature discussion
Date Mon, 26 Mar 2007 15:55:58 GMT
Hi Mamta,

Thanks for this extensive write-up. This helps me puzzle through the 
issues although I'm afraid I'm still muddled. Some comments follow inline.

Mamta Satoor wrote:
> lots of good stuff ...
> SQL spec also talks in various sections about default collation 
> associated with character repertoire, character set, SQL schema, SQL 
> session. These defaults are used to determine the collation 
> requirement for character string types in ambiguous places. (In a SQL 
> implementation of SQL spec's collation clause support, if the rules 
> for the collation determination is not what the application wants, 
> then the user can use the COLLATE clause to override the SQL spec 
> behavior.) For string literals, which has been the focus of recent 
> collation discussion on the Derby list, SQL spec specifies(Section 5.3 
> <literal> Syntax Rule 15) that it's collation will be the collation of 
> the character set (which is the default collation of the character 
> set. Every character set defined in the SQL implementation is required 
> to have a default collation).
> In the absence of the COLLATE clause support in Derby 10.3, I do not 
> think we can follow the SQL spec for string literals' collation type. 
> If we decide to make user defined collation(through the JDBC url 
> attribute COLLATION) as Derby's default collation for the character 
> set, then that would mean that a 10.3 db with COLLATION attribute as 
> TERRITORY_BASED will always have string literals with TERRITORY_BASED 
> collation. This will break our metadata queries which does comparison 
> of SYS character column against string literals. The SYS character 
> column will have UCS_BASIC collation and string literals will have 
> TERRITORY_BASED collation and during the comparison, Derby will end up 
> throwing exception because character strings with different collation 
> can't be compared. If Derby 10.3 had support for COLLATE clause, then 
> we could implement SQL spec behavior for string literal and let 
> metadata queries use the COLLATE clause and users could use the 
> COLLATE clause in their queries against system tables when using 
> string literals.
I wonder if we could just implement the COLLATE clause for string 
literals. Would that be a lot of work? The metadata queries could be 
adjusted to use the COLLATE clause. I think we should avoid violating 
the SQL standard if we can.
> So, in the absence of the COLLATE clause in Derby 10.3, what I am 
> proposing is string literals have a collation type of UNKNOWN. When 
> they get used in a collation operation, these UNKNOWN collation types 
> will get their collation from the other operands involved in the 
> operation (requirement here would be that all the operands whose 
> collation type is not UNKNOWN, will have the same collation type 
> associated with them. This requirement comes from Section 9.13 
> Collation determination, Syntax Rule 3e). If all of the operands in 
> collation operation have collation type as UNKNOWN, then the collation 
> chosen would be the one specified through the COLLATION attribute on 
> the JDBC url. This last line makes Derby's collation behavior not 
> match SQL spec Section 9.13 Collation determination, Rule 2 which says 
> "At least one operand shall have a declared type collation." But 
> again, I think in the absence of SQL's COLLATE clause, we have to 
> break some rules to make Derby more flexible. If we go with my 
> proposal for string literal, then a comparison like 'aa' < 'ba' will 
> have the 2 operands with UNKNOWN collation, and Derby's collation 
> algorithim at collation time will choose the value specified for 
> COLLATION attribute in the JDBC url. For function that returns a 
> string datatype, we will have the collation type for that string 
> datatype as UNKNOWN (same as string literals).
> Other than this, I think Derby's collation type of UNKNOWN has a clear 
> mapping with "none" value of collation derivation in SQL spec.
I'm a little lost here. Could you explain what is meant by the UNKNOWN 
collation? I don't see this collation described in either the SQL 
standard or in the spec attached to DERBY-1478.
> more good stuff ...

> Syntax Rule 4 says that their collation type is the collation 
> of character set SQL_IDENTIFIER. SQL spec Section 4.2.6 Collations 
> talks about SQL_IDENTIFIER's collation being implementation defined. 
> Based on this, we can decide the collation for these USER functions 
> to be UNKNOWN. Again, the argument here is same as used for string 
> literals.
It seems to me that SQL_IDENTIFIERs have collation UCS_BASIC. This is 
how they behave when it comes to preventing two schema objects from 
having the same name. And this is what forces a collation of UCS_BASIC 
on the string columns in the system tables. I'm not sure I understand 
bullet (7). I hope it is not saying that SQL_IDENTIFIERs have UNKNOWN 
collation. I think we will get into a lot of trouble if we try to force 
SQL_IDENTIFIERs to have some collation other than UCS_BASIC.


View raw message