lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-1470) Add TrieRangeQuery to contrib
Date Thu, 27 Nov 2008 08:37:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651279#action_12651279
] 

paul.elschot@xs4all.nl edited comment on LUCENE-1470 at 11/27/08 12:37 AM:
-----------------------------------------------------------------

{quote}Independent on the selected range the number of terms to be visited is limited by my
algorithm to the number of 3825 = (a maximum of 255 terms in the lowermost range)(a maximum
of 255 terms in the uppermost range)(255)(255)......+(255 in center of range).{quote}

The code uses a trie factor of 256, or 8 bits in a long of 64 bits.
Would it be possible to use lower values for this trie factor, like 16 (4 bits) or even 4
(2 bits)?
In such cases the (rough) maximum number of terms for a single ended range becomes smaller:
(256-1) * (64/8) = 255 * 8 = 2040
(16-1) * (64/4) = 15 * 16 = 240
(4-1) * (64/2) = 3 * 32 = 96
This reduction comes at the cost doubling or quadrupling the number in the indexed terms in
the lower precision field.

The number of characters in the lower precision terms is not really relevant in the term index,
because terms are indexed with common prefixes. Therefore in these cases one could just use
a character to encode the 4 bits or 2 bits.

So the question is would it be possible to specify the trie factor when building and using
the index?


      was (Author: paul.elschot@xs4all.nl):
    {quote}Independent on the selected range the number of terms to be visited is limited
by my algorithm to the number of 3825 = (a maximum of 255 terms in the lowermost range)(a
maximum of 255 terms in the uppermost range)(255)(255)......+(255 in center of range).{quote}

The code uses a trie factor of 256, or 8 bits in a long of 64 bits.
Would it be possible to use lower values for this trie factor, like 16 (4 bits) or even 4
(2 bits)?
In both cases the maximum number of terms is smaller:
(16-1) * (64/4) = 15 * 16 = 240
(4-1) * (64/2) = 3 * 32 = 96
This reduction comes at the cost doubling or quadrupling the number in the indexed terms in
the lower precision field.

The number of characters in the lower precision terms is not really relevant in the term index,
because terms are indexed with common prefixes. Therefore in these cases one could just use
a character to encode the 4 bits or 2 bits.

So the question is would it be possible to specify the trie factor when building and using
the index?

  
> Add TrieRangeQuery to contrib
> -----------------------------
>
>                 Key: LUCENE-1470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1470
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>         Attachments: LUCENE-1470.patch, LUCENE-1470.patch
>
>
> According to the thread in java-dev (http://www.gossamer-threads.com/lists/lucene/java-dev/67807
and http://www.gossamer-threads.com/lists/lucene/java-dev/67839), I want to include my fast
numerical range query implementation into lucene contrib-queries.
> I implemented (based on RangeFilter) another approach for faster
> RangeQueries, based on longs stored in index in a special format.
> The idea behind this is to store the longs in different precision in index
> and partition the query range in such a way, that the outer boundaries are
> search using terms from the highest precision, but the center of the search
> Range with lower precision. The implementation stores the longs in 8
> different precisions (using a class called TrieUtils). It also has support
> for Doubles, using the IEEE 754 floating-point "double format" bit layout
> with some bit mappings to make them binary sortable. The approach is used in
> rather big indexes, query times are even on low performance desktop
> computers <<100 ms (!) for very big ranges on indexes with 500000 docs.
> I called this RangeQuery variant and format "TrieRangeRange" query because
> the idea looks like the well-known Trie structures (but it is not identical
> to real tries, but algorithms are related to it).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message