db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Re: territory-based collations and optimizations for the LIKE operator
Date Mon, 25 Aug 2008 18:02:48 GMT
Thanks, Mamta. I'm having a hard time seeing these issues addressed by 
the descriptions of DERBY-1478 and DERBY-2793. Do you think it might 
make sense to log a new issue and link it to these old ones?

Thanks,
-Rick

Mamta Satoor wrote:
> Yes, that is correct.
>
> For 1), the disabling of optimizaiton for LIKE for a territory based
> database has jira entry DERBY-1478 for it.
>
> For 2), we have jira entry DERBY-2793
>
> Mamta
>
>
> On 8/25/08, Rick Hillegas <Richard.Hillegas@sun.com> wrote:
>   
>> Thanks, Mamta. This is very helpful. Based on your responses, I think that
>> the following summarizes how LIKE behaves in databases with territory based
>> collations:
>>
>> 1) The LIKE optimizations are disabled. For example, given the following
>> query where A is an indexed string column
>>
>>  select * from T where T.A like 'foo%'
>>
>> Derby cannot  use the index on A and falls back on performing a full table
>> scan.
>>
>> 2) The LIKE operator has Derby-specific semantics. The Derby-specifics
>> semantics return a subset of the rows which qualify under the ANSI rules.
>>
>> Does that sound correct to you?
>>
>> Thanks,
>> -Rick
>>
>>
>> Mamta Satoor wrote:
>>     
>>> Rick, the behavior we expect for LIKE and = in a territory based
>>> database can be explained by the following comment that I picked up
>>> from DERBY-3166
>>>
>>>
>>>       
>> **********************************************************
>>     
>>> The following applies to territory based database
>>> For LIKE, when comparing a pattern against a value string, we do the
>>> comparison of collation elements(s) for one character at a time for
>>> non-metacharacters. This is different than what is done for =
>>> operation. For =, we compare the collation elements for the entire
>>> string on left hand with the collation elements of the entire string
>>> on the right hand side.
>>> For eg say we are working with a territory where character 'z' has
>>> same collation elements as 'xy'. For such a territory consider 2
>>> clauses in WHERE clause
>>> 1)'zcb' = 'xycb'
>>> 2)'zcb' LIKE 'xy_b'
>>> For case 1), we will return TRUE because the collation elements for
>>> the entire string 'zcb' will match the collation elements of the
>>> entire string 'xycb'.
>>> For case 2) though, we will return FALSE because collation element(s)
>>> for character 'z' does not match the collation element(s) for
>>> character 'x'. So, as can be seen, the LIKE operation is one character
>>> at a time whereas = operation is the entire string at a time.
>>> In addition, the metacharacter _ in pattern for LIKE will consume
>>> *one* character in the string value. So for an eg clause 'xycb' LIKE
>>> '_cb' will return FALSE because metacharacter _ will consume 'x' and
>>> since 'c' does not match 'y', we will return FALSE.
>>>
>>>       
>> **********************************************************
>>     
>>> Is this the information you were looking for?
>>>
>>> thanks,
>>> Mamta
>>>
>>> On 8/22/08, Rick Hillegas <Richard.Hillegas@sun.com> wrote:
>>>
>>>
>>>       
>>>> According to the Developer's Guide section titled "Character-based
>>>>         
>> collation
>>     
>>>> in Derby", the LIKE operator behaves differently than the = operator.
>>>>         
>> That
>>     
>>>> is, if you are using a territory-based collation, two strings might =
>>>>         
>> one
>>     
>>>> another but not be LIKE one another (and vice-versa).
>>>>
>>>> In the meantime, according to the section in the Tuning Guide titled
>>>> "Character string beginning with constant", LIKE expressions which begin
>>>> with a constant (followed by a wildcard) can be transformed into
>>>>         
>> indexable
>>     
>>>> expressions involving the ">=" and "<" operators.
>>>>
>>>> To my muddled understanding, it seems that one of the following occurs:
>>>>
>>>> 1) The LIKE optimizations are disabled if you are using territory-based
>>>> collation
>>>>
>>>> or
>>>>
>>>> 2) a LIKE expression may return different results depending on your
>>>>         
>> indexes
>>     
>>>> and/or the presence of trailing wildcards in your LIKE expression.
>>>>
>>>> (1) seems like a performance issue which needs to be documented. (2)
>>>>         
>> seems
>>     
>>>> like a correctness problem. What is the behavior that we expect?
>>>>
>>>> Thanks,
>>>> -Rick
>>>>
>>>>
>>>>
>>>>         
>>     


Mime
View raw message