lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib
Date Mon, 09 Feb 2009 20:07:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671990#action_12671990
] 

Uwe Schindler commented on LUCENE-1470:
---------------------------------------

{quote}
Do we have test code that tests that the most efficient precision is used (as opposed to just
the right bits matching? i.e. for a precisionStep of 4
0x300-0x4ff could be matched with 3-4 with a shift of 8, or 30-4f with a shift of 4, or 300-4ff
with a shift of 0.
{quote}

I misunderstood your note. My last optimizations (now they are again restored in my svn) exactly
handle this. There are two cases:
 - if the left range of a precision has ((min & mask) == 0L) [starts exactly at the beginning
of the next more unprecise range], I leave out the left range for the actual precision and
directly use the next lower prec
 - if the right range of a precision has ((max & mask) == mask) [ends exactly at the end
of the next more unprecise range], I do the same for the right one.

My new code is now cleaner and easier to understand (there were some other unneeded extra
shifts/ands in it). I also merged the 64 bit and 32 bit range splitting and wrap the RangeBuilder
classes accordingly (now abstract with two different range collecting possibilities).

The old code was not aware of this, leading to sometimes left/right precisions that are not
needed. I will add test code in TestTrieUtils, that tests the range split (without an index).
The problem here is, how to test this effectively. I could generate some examples and test
for the resulting range bounds using a custom XxxRangeBuilder in the test, that collects the
ranges into a List and compares this list). Do you have another prossibility to test this
without a lot of manually checks example ranges?

I have now restored all my changes (and did an extra backup). I will now write again the new
Javadocs of TrieUtils and package.html. After that I will post a patch with the final API.
The extra and new tests will be added later. First I want to fixate and document  the API.

> Add TrieRangeQuery to contrib
> -----------------------------
>
>                 Key: LUCENE-1470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1470
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: fixbuild-LUCENE-1470.patch, fixbuild-LUCENE-1470.patch, LUCENE-1470-readme.patch,
LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch,
LUCENE-1470.patch, LUCENE-1470.patch, trie.zip, TrieRangeFilter.java, TrieUtils.java, TrieUtils.java,
TrieUtils.java, TrieUtils.java, TrieUtils.java
>
>
> According to the thread in java-dev (http://www.gossamer-threads.com/lists/lucene/java-dev/67807
and http://www.gossamer-threads.com/lists/lucene/java-dev/67839), I want to include my fast
numerical range query implementation into lucene contrib-queries.
> I implemented (based on RangeFilter) another approach for faster
> RangeQueries, based on longs stored in index in a special format.
> The idea behind this is to store the longs in different precision in index
> and partition the query range in such a way, that the outer boundaries are
> search using terms from the highest precision, but the center of the search
> Range with lower precision. The implementation stores the longs in 8
> different precisions (using a class called TrieUtils). It also has support
> for Doubles, using the IEEE 754 floating-point "double format" bit layout
> with some bit mappings to make them binary sortable. The approach is used in
> rather big indexes, query times are even on low performance desktop
> computers <<100 ms (!) for very big ranges on indexes with 500000 docs.
> I called this RangeQuery variant and format "TrieRangeRange" query because
> the idea looks like the well-known Trie structures (but it is not identical
> to real tries, but algorithms are related to it).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message