lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Han Jiang <>
Subject Re: VInt block lenght in Lucene 4.1 postings format
Date Thu, 01 Aug 2013 09:00:11 GMT
Hi Aleksandra,

The PostingsReader uses a skip list to determine the start file
pointer of each block (both FOR packed and vInt encoded). The
is currently maintained by Lucene41SkipReader.

The tricky part is, for each term, the skip data is exactly at the end
of TermFreqs blocks, so, if you fetch the startFP for vInt block, and
knows the docTermStartOffset & skipOffset for current term, you can
calculate out what you need.

On Thu, Aug 1, 2013 at 4:20 PM, Aleksandra Wo┼║niak
<> wrote:
> Hi all,
> recently I wanted to try out some modifications of Lucene's postings format
> (namely, copying blocks that have no deletions without int-decoding/encoding
> -- this is similar to what was described here:
> I started with changing
> Lucene 4.1 postings format to check what can be done there.
> I came across the following problem: in Lucene41PostingsReader the length
> (number of bytes) of the last, vInt-encoded, block of posting in not known
> before all individual postings are read and decoded. When reading this block
> we only know the number of postings that should be read and decoded -- since
> vInts have different sizes by definition.
> If I wanted to copy the whole block without vInt decoding/encoding, I need
> to know how many bytes I have to read from postings index input. So, my
> question is: is there a clean way to determine the length of this block (ie.
> the number of bytes that this block has)? Is the number of bytes in a
> posting list tracked somewhere in Lucene 4.1 postings format?
> Thanks,
> Aleksandra

Han Jiang

Team of Search Engine and Web Mining,
School of Electronic Engineering and Computer Science,
Peking University, China

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message