lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <>
Subject [jira] Commented: (LUCENE-1279) RangeQuery and RangeFilter should use collation to check for range inclusion
Date Sun, 14 Sep 2008 16:21:44 GMT


Grant Ingersoll commented on LUCENE-1279:

I think the problem is that every single index term has to be converted to a CollationKey
for every single (range) search. 

Yes, agreed.  The question mainly is would that be faster than the String comparisons.  Basically,
is a construction plus a bitwise compare faster than a string compare?  

Languages, in some cases using the same character repertoire, define different orderings.
Also, I believe some orderings are context dependent - you can't always compare character
by character. So adding this stuff to Lucene would be to duplicate a lot of the stuff that's
already done in the Collator.

Makes sense, was just wondering if there were some shortcuts to be had since we have a very
particular case and I was thinking maybe it would allow us to narrow down the range to search.

For instance, hypothetically speaking, say your field had a full range of words starting with
A up to Z, but that you knew the ordering problem only occurred between L and P and that your
lower and upper terms K and Q, then you could feel confident that you could skip to K and
stop at Q w/o any ramifications.  I realize this is repeating what is in the Collator, but
it would be nice if the collator exposed the info.  However, perhaps, if using a RuleBasedCollator,
the getRules() method could be used to optimize.  Again, just thinking out loud, I haven't
explored it.

I agree, this should still go forward, even as is.

> RangeQuery and RangeFilter should use collation to check for range inclusion
> ----------------------------------------------------------------------------
>                 Key: LUCENE-1279
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Steven Rowe
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 2.4
>         Attachments: LUCENE-1279.patch, LUCENE-1279.patch, LUCENE-1279.patch, LUCENE-1279.patch
> See [this java-user discussion|]
of problems caused by Unicode code-point comparison, instead of collation, in RangeQuery.
> RangeQuery could take in a Locale via a setter, which could be used with a java.text.Collator
and/or CollationKey's, to handle ranges for languages which have alphabet orderings different
from those in Unicode.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message