lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4643) PackedInts: convenience classes to write blocks of packed ints
Date Mon, 07 Jan 2013 18:00:16 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13546096#comment-13546096
] 

Robert Muir commented on LUCENE-4643:
-------------------------------------

You're right: I forgot about the "real use-cases". The stuff like we used in stored fields
would be really really nice to factor out somehow,
I'd like to investigate its use for docvalues variable-length byte[] for example too.

{quote}
An other thing to know is that if all values are positive, minValue is likely to be 0. For
example, let's say the actual min is 200 and the max is 2000. Given that encoding the [0-2000]
range requires as many bits per value as encoding the [200-2000] range, I set minValue=0.
This will require only one bit in the token instead of two bytes (a VInt >= 2^7) for the
minimum. So in the end, even if one bit is wasted for the minimum value because of zig-zag
encoding, this is not too bad.
{quote}

Ok, this makes sense. +1 :)
                
> PackedInts: convenience classes to write blocks of packed ints
> --------------------------------------------------------------
>
>                 Key: LUCENE-4643
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4643
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-4643.patch, LUCENE-4643.patch
>
>
> It is often useful to divide a packed stream into fixed blocks which are all compressed
independently:
>  * if your sequence of ints is very large, you won't have to buffer everything into memory
to compute the required number of bits per value,
>  * the compression ratio will be better in case of rare extreme values.
> The only drawback compared to the original PackedInts API is that the stream cannot be
directly used to deserialize a random-access PackedInts.Reader (but for sequential access,
this is just fine).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message