Date Thu, 27 Nov 2008 08:37:44 GMT
Subject [jira] Issue Comment Edited: (LUCENE-1470) Add TrieRangeQuery to contrib
Date Thu, 27 Nov 2008 08:37:44 GMT
paul.elschot@xs4all.nl edited comment on LUCENE-1470 at 11/27/08 12:37 AM:
-----------------------------------------------------------------

{quote}Independent on the selected range the number of terms to be visited is limited by my
algorithm to the number of 3825 = (a maximum of 255 terms in the lowermost range)(a maximum
of 255 terms in the uppermost range)(255)(255)......+(255 in center of range).{quote}

The code uses a trie factor of 256, or 8 bits in a long of 64 bits.
Would it be possible to use lower values for this trie factor, like 16 (4 bits) or even 4
(2 bits)?
In such cases the (rough) maximum number of terms for a single ended range becomes smaller:
(256-1) * (64/8) = 255 * 8 = 2040
(16-1) * (64/4) = 15 * 16 = 240
(4-1) * (64/2) = 3 * 32 = 96
This reduction comes at the cost doubling or quadrupling the number in the indexed terms in
the lower precision field.

The number of characters in the lower precision terms is not really relevant in the term index,
because terms are indexed with common prefixes. Therefore in these cases one could just use
a character to encode the 4 bits or 2 bits.

So the question is would it be possible to specify the trie factor when building and using
the index?

