lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4161) Make PackedInts usable by codecs
Date Sun, 01 Jul 2012 00:00:48 GMT


Michael McCandless commented on LUCENE-4161:

bq. The meaning of n is actually a bit complicated.

Thank you for the explanation!  That makes sense.  I think "iterations" is good?  Or ... maybe
we simply leave it as n and then put this nice explanation in there as a comment?

Naming is the hardest part :)

bq. What additional methods do you think we need?

I'm not sure off-hand yet ... we've been iterating in LUCENE-3892 to find the least-cost way
to decode from the underlying byte based storage from the IndexInput, but with no real clear
fastest solution yet.  Logically we are currently storing an int[] and decoding into int[],
so I guess encode/decode to/from int[]?  We should probably try long[] as the backing too
... but, I think we should explore this (adding int[] based methods) under a new issue?  This
patch is already great progress.
> Make PackedInts usable by codecs
> --------------------------------
>                 Key: LUCENE-4161
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/store
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-4161.patch
> Some codecs might be interested in using PackedInts.{Writer,Reader,ReaderIterator} to
read and write fixed-size values efficiently.
> The problem is that the serialization format is self contained, and always writes the
name of the codec, its version, its number of bits per value and its format. For example,
if you want to use packed ints to store your postings list, this is a lot of overhead (at
least ~60 bytes per term, in case you only use one Writer per term, more otherwise).
> Users should be able to externalize the storage of metadata to save space. For example,
to use PackedInts to store a postings list, one should be able to store the codec name, its
version and the number of bits per doc in the header of the terms+postings list instead of
having to write it once (or more!) per term.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message