db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3854) Implement LIKE transformations and optimizations for databases using territory-based collations
Date Fri, 29 Aug 2008 14:24:44 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626986#action_12626986
] 

Knut Anders Hatlen commented on DERBY-3854:
-------------------------------------------

Don't know if it'll work or if I use the correct terminology, but here
are some thoughts on how we could optimize LIKE when using collation:

The collators we use are always of type RuleBasedCollator, from which
we can receive the rules by calling getRules(). By looking at the
rules, we may be able to perform some optimizations. One example:

  col LIKE 'abcde%'

If we know, by looking at the rules, that (1) all the characters
before the wildcard map to exactly one collation element, and (2) that
none of them are ignorable, and (3) that the last characters up to the
wildcard cannot be the start of a sequence of characters mapping to
one collation element, we can perform the same optimization as we can
when we're not using collation. That is, we can add the predicate
'abcde' <= col < 'abcde\uFFFF' to limit the scan.

Whereas databases using collation=USC_BASIC can always use the prefix
up to the first wildcard character for optimizations like this, I
think databases with territory-based collation can only use the prefix
up to the first occurrence of one of

  a) A wildcard character

  b) An ignorable character

  c) A character that maps to a sequence of collation elements

  d) A sequence of characters mapping to one collation element

  e) A sequence of characters which could be the start of a sequence
  that maps to one collation element, directly followed by a wildcard

So with territory=no_NO, where a < z < aa, we'd optimize the
expression

  col LIKE 'data%'

by adding the predicate 'dat' <= col < 'dat\uFFFF' because the 'a'
right before the '%' could be the start of the sequence 'aa' which
maps to a single collation element (rule (e) above). (Or if we want to
be really sophisticated, I think we can use the predicate 'data' <=
col < 'dataa\uFFFF' in this case, but that would require a more
thorough analysis of the collator's rules.)

> Implement LIKE transformations and optimizations for databases using territory-based
collations
> -----------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3854
>                 URL: https://issues.apache.org/jira/browse/DERBY-3854
>             Project: Derby
>          Issue Type: Improvement
>    Affects Versions: 10.3.3.0, 10.4.1.3, 10.5.0.0
>            Reporter: Rick Hillegas
>
> The LIKE transformations and optimizations are disabled when using a database with a
territory-based collation. See the following email thread: http://www.nabble.com/territory-based-collations-and-optimizations-for-the-LIKE-operator-td19111725.html#a19111725
That thread, in turn, refers to DERBY-1478. It would be nice if we did not have to perform
full table scans for LIKE queries in databases with territory-based collations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message