lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Renaud Delbru (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-2886) Adaptive Frame Of Reference
Date Fri, 04 Feb 2011 10:43:48 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990509#comment-12990509
] 

Renaud Delbru edited comment on LUCENE-2886 at 2/4/11 10:42 AM:
----------------------------------------------------------------

Hi Michael, Robert,
great to hear that the code is useful, looking forward to see some benchmark.
I think the VarIntBlock approach is a good idea. Concerning the two unused "frame" codes,
it will not cost too much to add them. This might be useful for the frequency inverted lists.
However, I am not sure they will be used that much. In our experiments, we had a version of
AFOR allowing frames of size 8, 16 and 32 integers with allOnes and allZeros. The gain was
very minimal, in the order to 0.x% index size reduction, because these cases were occurring
very rarely. But, this is still better than nothing. However, in the case of simple64, we
are not talking about small frame (up to 32 integers), but frame of 120 to 240 integers. Therefore,
I expect to see a drop of probability to encounter 120 or 240 consecutive ones. Maybe we can
use them for more clever configurations such as
- inter-leaved sequences of 1 bit and 2 bits integers
- inter-leaved sequences of 2 bits and 3 bits integers
or something like this.
The best will be to do some tests to see which new configurations will make sense, like how
many times a allOnes config is selected, or other configs, and choose which one to add. But
this can be tedious task with only a limited benefit.

      was (Author: renaud.delbru):
    Hi Michael, Robert,
great to hear that the code is useful, looking forward to see some benchmark.
I think the VarIntBlock approach is a good idea. Concerning the two unused "frame" codes,
it will not cost too much to add them. This might be useful for the frequency inverted lists.
However, I am not sure they will be used that much. In our experiments, we had a version of
AFOR allowing frames of size 8, 16 and 32 integers with allOnes and allZeros. The gain was
very minimal, in the order to 0.x% index size reduction, because these cases were occurring
very rarely. But, this is still better than nothing. However, in the case of simple64, we
are not talking about small frame (up to 32 integers), but frame of 120 to 240 integers. Therefore,
I expect to see a drop of probability to encounter 120 or 240 consecutive ones. Maybe we can
use them for more clever configurations such as
- inter-leaved sequences of 1 bit and 2 bits integers
- inter-leaved sequences of 2 bits and 3 bits integers
or something like this.
The best will be to do some tests to see which new configurations will make sense, like how
many times a allOnes config is selected, or other configs, and choose which one to add.
  
> Adaptive Frame Of Reference 
> ----------------------------
>
>                 Key: LUCENE-2886
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2886
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Codecs
>            Reporter: Renaud Delbru
>             Fix For: 4.0
>
>         Attachments: LUCENE-2886_simple64.patch, LUCENE-2886_simple64_varint.patch, lucene-afor.tar.gz
>
>
> We could test the implementation of the Adaptive Frame Of Reference [1] on the lucene-4.0
branch.
> I am providing the source code of its implementation. Some work needs to be done, as
this implementation is working on the old lucene-1458 branch. 
> I will attach a tarball containing a running version (with tests) of the AFOR implementation,
as well as the implementations of PFOR and of Simple64 (simple family codec working on 64bits
word) that has been used in the experiments in [1].
> [1] http://www.deri.ie/fileadmin/documents/deri-tr-afor.pdf

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message