lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Could positions/payloads in SegmentMerger be copied directly?
Date Tue, 23 Sep 2008 21:16:46 GMT
Op Tuesday 23 September 2008 20:26:18 schreef Michael McCandless:
> Paul Elschot wrote:
> > So, adding a document offset from the  documents/frequencies
> > into the positions/payloads for each document would allow:
> > -  bulk copying of the position/payloads during merging, and
> > -  a more efficient implementation of TermPositions.skipTo()
> >   in that decoding the positions from the last available skip
> >   document to the target of skipTo() could be avoided.
> > Is that correct?
> Yes, though this would also add cost of computing/writing/reading
> that new offset, and would increase the index size.
> > That would indeed be invasive.
> Yes.  I think our time would likely be better spent working on using
> PForDelta for freq/prox.

To change the prox data to PForDelta, it's nice to be have
bulk copies on prox working first. That would allow to change
the total size of the prox data easily.

But it appears to be easier to start with the doc/freq data, add
more prox pointers there, and then change the prox data.

PForDelta is fundamentally different from the existing index data
because an encoded number cannot be accessed on a byte
border. I don't know yet how to deal with that in the index
data structures.

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message