db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel John Debrunner <...@apache.org>
Subject Collation and string literals limitation in SQL standard?
Date Mon, 04 Jun 2007 22:33:21 GMT
Following the SQL standard the collation derivation of a string literal 
and a column reference are both "implicit". This is what the current 
collation feature is implementing.

This leads to some interesting cases once Derby supports column specific 
collations.

Take:

create table T(NAME CHAR(10) COLLATE X);

Now the boolean expression in a query

    NAME = 'Fred'

will fail if the collation for 'Fred' is not X. That could exist with 
the current implementation (literals default to the collation of the 
current schema) or the original proposal (literals default to the 
database's collation).

It gets a little stranger if a comparison expression is used within the 
create table:

  create table T(TYPE CHAR(10) COLLATE X CHECK TYPE IN ('CAR', 'PLANE', 
'BOAT');

Now the create table will fail.

Both of these could be made to work with a COLLATE clause, but it 
becomes especially strange in the CHECK clause, as the COLLATE clause 
must now appear multiple times (I think).

  create table T(TYPE CHAR(10) COLLATE X CHECK TYPE IN ('CAR' COLLATE X, 
'PLANE' COLLATE X, 'BOAT' COLLATE X);

In both of these cases it seems (to me) that what the language really 
intends is that collation to be used is that of the column reference.

Note that even without multiple user collations, Derby will have this 
problem for queries that system columns and user schemas, e.g. 
sys.systables.TABLENAME = 'MYTABLE' will fail if compiled in a user schema.

Interestingly both MySQL and Microsoft have solved this problem by 
adding additional levels to the derivation concept. SQL Standard 
supports three levels: NONE, IMPLICIT and EXPLICIT.

Microsoft has added one: Coercible-default between implicit and none.

http://msdn2.microsoft.com/en-us/library/ms179886.aspx

and string literals map to coercible-default, thus in the above examples 
the column's collation will be used.

MySQL instead has a concept of "coercibility" with six levels, 0-5. This 
is claimed to be SQL standard but seems to be from an earlier version of 
the standard. I found another reference to it in some SQL book.

http://doc.51windows.net/mysql/?url=/mysql/ch10s05.html
(section 10.5.4)

Again string literals are at a different level to column references 
leading to column collation being used.


Does anyone know of any correction in the SQL standard regarding this 
issue? I haven't been able to find one.

I'm just thinking about the future and trying to ensure that anyone who 
implements multiple user collations for Derby won't have to break 
backwards compatibility to ensure rational behaviour.

Thanks,
Dan.


Mime
View raw message