lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1541) Trie range - make trie range indexing more flexible
Date Sun, 22 Feb 2009 19:19:01 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675695#action_12675695
] 

Uwe Schindler commented on LUCENE-1541:
---------------------------------------

After thinking one night longer about the whole issue, I suspect, that non-equidistant precision
steps are really needed:

Ning's comment is correct about the number of terms. But if you only index long values from,
e.g., 0L to 10000L, you create a lot of terms for shift values 0 to 14 (because the terms
are in this range). For shift values 15 to 63, the term is always the same constant term.
The index' TermEnum so contains not many additional values (because its only *one* term for
*all* documents), only additional TermDocs entries are created. It's the same like adding
one "flag" term to all documents. This does not use much additional space in index. When you
query a range, these terms are never used, but they do not hurt.

The additional space for the trie terms is generated by higher precision (lower shift) values.
If you index with precision step 4 or 2 instead of a precision step of 8, you create a lot
of *different* terms for the lower shift values. The constant terms in the higher shifts are
still always the same and does not consume much space.

I will create a small comparison on index size for long values without higher bits, but I
suspect, that index size without lower precision terms reduces space significant. If this
is the case, I do not think the additional complexity of the API is needed for this low impact.
If somebody really wants to optimize index size so much, he can create a optimized fork of
TrieRange in his project that indexes with non-equidistant precision steps. On the other hand,
I would suggest to use ints/floats instead of longs/doubles, if only lower precision is needed.
In this case, less terms will be created. For floats in comparision to doubles, the effect
will be bigger (because even small doubles use a lot of hgher precision bits), a non-equidistant
precision step will not help very much here.

> Trie range - make trie range indexing more flexible
> ---------------------------------------------------
>
>                 Key: LUCENE-1541
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1541
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.9
>            Reporter: Ning Li
>            Assignee: Uwe Schindler
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1541.patch, LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is specified. With
a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms
for a range can be large. With a small precision step (say 2), the number of terms for a range
is smaller but a value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for different precisions.
An expert can use this option to keep the number of terms for a range small and at the same
time index a value in a small number of terms. See the discussion in LUCENE-1470 that results
in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message