lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1279) RangeQuery and RangeFilter should use collation to check for range inclusion
Date Sun, 14 Sep 2008 16:21:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630894#action_12630894
] 

Grant Ingersoll commented on LUCENE-1279:
-----------------------------------------

{quote}
I think the problem is that every single index term has to be converted to a CollationKey
for every single (range) search. 
{quote}

Yes, agreed.  The question mainly is would that be faster than the String comparisons.  Basically,
is a construction plus a bitwise compare faster than a string compare?  


{quote}
Languages, in some cases using the same character repertoire, define different orderings.
Also, I believe some orderings are context dependent - you can't always compare character
by character. So adding this stuff to Lucene would be to duplicate a lot of the stuff that's
already done in the Collator.
{quote}

Makes sense, was just wondering if there were some shortcuts to be had since we have a very
particular case and I was thinking maybe it would allow us to narrow down the range to search.

For instance, hypothetically speaking, say your field had a full range of words starting with
A up to Z, but that you knew the ordering problem only occurred between L and P and that your
lower and upper terms K and Q, then you could feel confident that you could skip to K and
stop at Q w/o any ramifications.  I realize this is repeating what is in the Collator, but
it would be nice if the collator exposed the info.  However, perhaps, if using a RuleBasedCollator,
the getRules() method could be used to optimize.  Again, just thinking out loud, I haven't
explored it.

I agree, this should still go forward, even as is.


> RangeQuery and RangeFilter should use collation to check for range inclusion
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-1279
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1279
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Steven Rowe
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: LUCENE-1279.patch, LUCENE-1279.patch, LUCENE-1279.patch, LUCENE-1279.patch
>
>
> See [this java-user discussion|http://www.nabble.com/lucene-farsi-problem-td16977096.html]
of problems caused by Unicode code-point comparison, instead of collation, in RangeQuery.
> RangeQuery could take in a Locale via a setter, which could be used with a java.text.Collator
and/or CollationKey's, to handle ranges for languages which have alphabet orderings different
from those in Unicode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message