lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <>
Subject [jira] Commented: (LUCENE-1279) RangeQuery and RangeFilter should use collation to check for range inclusion
Date Sat, 13 Sep 2008 13:48:44 GMT


Grant Ingersoll commented on LUCENE-1279:

Mostly this consisted of switching away from deprecated Hits in tests. 

Seems like the new tests in TestRangeFilter still uses Hits.

Also, from the Collator javadocs:
When sorting a list of Strings however, it is generally necessary to compare each String multiple
times. In this case, CollationKeys provide better performance. The CollationKey class converts
a String to a series of bits that can be compared bitwise against other CollationKeys. A CollationKey
is created by a Collator object for a given String. 

I don't think we need to implement this now, but I wonder if there is a performance difference
if we created the CollationKey for comparison.  The big question is whether the construction
of that for each term outweighs the savings by repeated comparisons to lower and upper.  

One more question, and it probably shows my lack of knowledge here, but would it be possible
to enumerate the various codepoints where there are problems and just handle these separately,
somehow?  Basically, how pervasive is the problem?  Would we perhaps be better off having
a check to see if one of these bad codepoints falls in the range of lower/upper and then handle
it separately?  Or, perhaps, some reasoning  would allow us to better narrow in on the lowerTerm/upper
instead of having to check the whole field.  Just thinking out loud...

Otherwise, looks pretty good.

> RangeQuery and RangeFilter should use collation to check for range inclusion
> ----------------------------------------------------------------------------
>                 Key: LUCENE-1279
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Steven Rowe
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 2.4
>         Attachments: LUCENE-1279.patch, LUCENE-1279.patch, LUCENE-1279.patch
> See [this java-user discussion|]
of problems caused by Unicode code-point comparison, instead of collation, in RangeQuery.
> RangeQuery could take in a Locale via a setter, which could be used with a java.text.Collator
and/or CollationKey's, to handle ranges for languages which have alphabet orderings different
from those in Unicode.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message