db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bernt M. Johnsen" <Bernt.John...@Sun.COM>
Subject Re: Territory based database: proposed change to current LIKE behavior vs = for character string types
Date Wed, 10 Oct 2007 13:16:31 GMT
>>>>>>>>>>>> Mamta Satoor wrote (2007-10-09 09:40:53):
> [...]
> Unicode has a concept of Contraction where a user might perceive more than
> one character as a single character in a given language. One eg of this
> would be 'AA' in Norwegian locale. Although, this Contraction is made of
> 2 Unicode characters, 'A' and 'A', a Norwegian user perceives them as
> a single character.

No. Noone in Norway would perceive 'aa' as a single character.

> In addition, in Norwegian, the collation elements for 'AA' are
> identical to collation elements for 'Å'. So, the question is what
> should SQL operation 'AA' like 'Å' return? Also, is that behavior
> same as SQL operation 'AA' = 'Å' ?

'AA' and 'Å' will in Norwegian have the same collation weight, but the
way the SQL standard is phrased, that does not apply to LIKE which
does an character by character match of the strings. I admit that 

'aa'= 'å' 


'aa' LIKE 'å' 

giving different results may seem a bit odd but I am not able to
interpret Sections 8.2 and 8.5 differently (On the other hand, that's
not the only place the SQL spec is contraintuitive for the average

On the other hand, searching and matching are handled in Unicode T10
sections 1.5 and 8 (SELECT is explicitely mentioned in 1.5). It seems
that Unicode is in agreement with how I interpret the semantics for
comparison operators but not in agreement with how I interpret the
sematics for the LIKE predicate.

Does anyone know what MySQL, Oracle, SQL Server etc does with this? If
the other major databases are in agreement, we could follow their
interpretation if we find it reasonable.
Bernt Marius Johnsen, Database Technology Group, 
Staff Engineer, Derby/Java DB
Sun Microsystems, Trondheim, Norway

View raw message