lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: Does Lucene compress postings (or posting lists) in its inverted index?
Date Sun, 17 Oct 2010 07:59:45 GMT
Hi Mahmoud,

On Sun, Oct 17, 2010 at 9:16 AM, Mahmoud Abdelkader
<mabdelkader@gmail.com> wrote:
> Hello,
>
> We're currently evaluating utilizing Lucene to index a large English corpus
> and we were are optimizing for space. We're basically concerned that the
> size of the postings lists will become extremely large. Does Lucene provide
> some kind of compression for the generated posting lists within the index?
> If not, is there a way to force Lucene do this?

Before you do any further investigation you should have a quick look
at the Lucene index format
(http://lucene.apache.org/java/3_0_2/fileformats.html) to get a
feeling about lucenes "standard" index format. We use some compression
techniques like delta coding, VarInt etc. in the current releases but
current trunk development might be more interesting for you. On Lucene
4.0 trunk you will be able to "plug-in" customized codecs for posting
lists, term dicts and others like stored fields (stored fields maybe
soon!). There are efforts towards implementations like PFor
(https://issues.apache.org/jira/browse/LUCENE-1410) and others coming
up (GVInt would be nice :).

You should also have a look at mikes post on PFoR
http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html

simon
>
> Thanks for the help in advance,
> Mahmoud
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message