db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Re: territory-based collations and optimizations for the LIKE operator
Date Mon, 25 Aug 2008 16:48:04 GMT
Thanks, Mamta. This is very helpful. Based on your responses, I think 
that the following summarizes how LIKE behaves in databases with 
territory based collations:

1) The LIKE optimizations are disabled. For example, given the following 
query where A is an indexed string column

   select * from T where T.A like 'foo%'

Derby cannot  use the index on A and falls back on performing a full 
table scan.

2) The LIKE operator has Derby-specific semantics. The Derby-specifics 
semantics return a subset of the rows which qualify under the ANSI rules.

Does that sound correct to you?


Mamta Satoor wrote:
> Rick, the behavior we expect for LIKE and = in a territory based
> database can be explained by the following comment that I picked up
> from DERBY-3166
> **********************************************************
> The following applies to territory based database
> For LIKE, when comparing a pattern against a value string, we do the
> comparison of collation elements(s) for one character at a time for
> non-metacharacters. This is different than what is done for =
> operation. For =, we compare the collation elements for the entire
> string on left hand with the collation elements of the entire string
> on the right hand side.
> For eg say we are working with a territory where character 'z' has
> same collation elements as 'xy'. For such a territory consider 2
> clauses in WHERE clause
> 1)'zcb' = 'xycb'
> 2)'zcb' LIKE 'xy_b'
> For case 1), we will return TRUE because the collation elements for
> the entire string 'zcb' will match the collation elements of the
> entire string 'xycb'.
> For case 2) though, we will return FALSE because collation element(s)
> for character 'z' does not match the collation element(s) for
> character 'x'. So, as can be seen, the LIKE operation is one character
> at a time whereas = operation is the entire string at a time.
> In addition, the metacharacter _ in pattern for LIKE will consume
> *one* character in the string value. So for an eg clause 'xycb' LIKE
> '_cb' will return FALSE because metacharacter _ will consume 'x' and
> since 'c' does not match 'y', we will return FALSE.
> **********************************************************
> Is this the information you were looking for?
> thanks,
> Mamta
> On 8/22/08, Rick Hillegas <Richard.Hillegas@sun.com> wrote:
>> According to the Developer's Guide section titled "Character-based collation
>> in Derby", the LIKE operator behaves differently than the = operator. That
>> is, if you are using a territory-based collation, two strings might = one
>> another but not be LIKE one another (and vice-versa).
>> In the meantime, according to the section in the Tuning Guide titled
>> "Character string beginning with constant", LIKE expressions which begin
>> with a constant (followed by a wildcard) can be transformed into indexable
>> expressions involving the ">=" and "<" operators.
>> To my muddled understanding, it seems that one of the following occurs:
>> 1) The LIKE optimizations are disabled if you are using territory-based
>> collation
>> or
>> 2) a LIKE expression may return different results depending on your indexes
>> and/or the presence of trailing wildcards in your LIKE expression.
>> (1) seems like a performance issue which needs to be documented. (2) seems
>> like a correctness problem. What is the behavior that we expect?
>> Thanks,
>> -Rick

View raw message