lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib
Date Mon, 09 Feb 2009 16:14:59 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671907#action_12671907
] 

Uwe Schindler commented on LUCENE-1470:
---------------------------------------

I was preparing the whole day the final version, including all javadoc, but then i overwrote
the wron file and the whole work of today (according trie) is away. Give me one more day,
and I will redo everything. I changed since yesterday a lot in the trie code. Your code was
a little bit better when the range bound of one precision was exact on the range's start or
end (in this case the precision could be left out, in your code the boolean needUpper and
needLower). I implemented this similar.

I also extended the interface a little bit, but this is work I have to redo. So it takes now
longer. Most work is writing documentation and javadocs. If everything had worked ok (and
I did not overwrite/update svn in the wrong way, I would be finished now :(

bq. Do we have test code that tests that the most efficient precision is used (as opposed
to just the right bits matching? i.e. for a precisionStep of 4
0x300-0x4ff could be matched with 3-4 with a shift of 8, or 30-4f with a shift of 4, or 300-4ff
with a shift of 0.

The most efficent precision is sometimes hard, but the optimization above with needUpper/needLower
is really good sometimes (depends on the range). I think about it.

Uwe

> Add TrieRangeQuery to contrib
> -----------------------------
>
>                 Key: LUCENE-1470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1470
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: fixbuild-LUCENE-1470.patch, fixbuild-LUCENE-1470.patch, LUCENE-1470-readme.patch,
LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch, LUCENE-1470.patch,
LUCENE-1470.patch, LUCENE-1470.patch, trie.zip, TrieRangeFilter.java, TrieUtils.java, TrieUtils.java,
TrieUtils.java, TrieUtils.java, TrieUtils.java
>
>
> According to the thread in java-dev (http://www.gossamer-threads.com/lists/lucene/java-dev/67807
and http://www.gossamer-threads.com/lists/lucene/java-dev/67839), I want to include my fast
numerical range query implementation into lucene contrib-queries.
> I implemented (based on RangeFilter) another approach for faster
> RangeQueries, based on longs stored in index in a special format.
> The idea behind this is to store the longs in different precision in index
> and partition the query range in such a way, that the outer boundaries are
> search using terms from the highest precision, but the center of the search
> Range with lower precision. The implementation stores the longs in 8
> different precisions (using a class called TrieUtils). It also has support
> for Doubles, using the IEEE 754 floating-point "double format" bit layout
> with some bit mappings to make them binary sortable. The approach is used in
> rather big indexes, query times are even on low performance desktop
> computers <<100 ms (!) for very big ranges on indexes with 500000 docs.
> I called this RangeQuery variant and format "TrieRangeRange" query because
> the idea looks like the well-known Trie structures (but it is not identical
> to real tries, but algorithms are related to it).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message