lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleksandra Wo┼║niak <aleksandra.k.wozn...@gmail.com>
Subject VInt block lenght in Lucene 4.1 postings format
Date Thu, 01 Aug 2013 08:20:02 GMT
Hi all,

recently I wanted to try out some modifications of Lucene's postings
format (namely, copying blocks that have no deletions without
int-decoding/encoding -- this is similar to what was described here:
https://issues.apache.org/jira/browse/LUCENE-2082). I started with changing
Lucene 4.1 postings format to check what can be done there.

I came across the following problem: in Lucene41PostingsReader the length
(number of bytes) of the last, vInt-encoded, block of posting in not known
before all individual postings are read and decoded. When reading this
block we only know the number of postings that should be read and decoded
-- since vInts have different sizes by definition.

If I wanted to copy the whole block without vInt decoding/encoding, I need
to know how many bytes I have to read from postings index input. So, my
question is: is there a clean way to determine the length of this block
(ie. the number of bytes that this block has)? Is the number of bytes in a
posting list tracked somewhere in Lucene 4.1 postings format?

Thanks,
Aleksandra

Mime
View raw message