lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Bowyer <gbow...@fastmail.co.uk>
Subject Re: Compression algorithm for posting lists
Date Mon, 28 Mar 2016 10:51:21 GMT
The posting list is compressed using a specialised technique aimed at
pure numbers. Currently the codec uses a variant of Patched Frame of
Reference coding to perform this compression. 

A good survey of such techniques can be found in the good IR books
(https://mitpress.mit.edu/books/information-retrieval,
http://www.amazon.com/Managing-Gigabytes-Compressing-Multimedia-Information/dp/1558605703,
http://nlp.stanford.edu/IR-book/) as well as this paper
http://eprints.gla.ac.uk/93572/1/93572.pdf.

Interestingly, there are potentially some wins in finding better integer
codings (and one of my personal projects is aimed at doing exactly
this), but I doubt LZ4 compressing the posting list would help all that
much.

Hope this helps

On Mon, Mar 28, 2016, at 10:51 AM, Vishwas Jain wrote:
> Hello ,
> 
>           We are trying to implement better compression techniques in
> lucene54 codec of Apache Lucene. Currently there is no such compression
> for
> posting lists in lucene54 codec but LZ4 compression technique is used for
> stored fields. Does anyone know why there is no compression technique for
> postings lists? and what are the possible compression that would benefit
> if
> implemented?
> 
> Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message