lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Could positions/payloads in SegmentMerger be copied directly?
Date Sat, 20 Sep 2008 19:47:15 GMT
Op Friday 19 September 2008 17:05:29 schreef Michael McCandless:
> Not quite, because how positions are encoded depends on whether any
> payload appeared in that segment.
> However, if 1) the input is a SegmentReader (since in general we can
> merge any IndexReader), and 2) its format is "congruent" with the
> format we are writing (ie both don't or do use the payloads format),
> which ought to be true the vast majority of the time, then I think we
> could simply copy bytes.  Since the next TermInfo tells us the
> proxPointer where it begins, we know exactly how many bytes to copy.
> I think this'd be a nice optimization!

I tried to find a way to do this, but I'm stuck at the point where
the proxPointer is needed from a TermInfo.
I got this far (uncompiled code, smi is the SegmentMergeInfo
that is currently merged):

    if (smi.reader instanceof SegmentReader) {
      SegmentReader inputReader = smi.reader;
      boolean readerStorePayloads = 
      if (storePayloads == readerStorePayloads) {
        // take the difference of the two prox pointers:
        int positionsLength = inputReader.tis. ... -  ...;
        // do a direct byte copy from inputReader to proxOutput:
        ... ;

but I could not find out how to get from the TermInfosReader
at inputReader.tis to the next prox pointer.

SegmentMerger never needs to index the positions by using a
proxPointer itself, as it accesses all positions serially. This leaves
me without an example on how to use proxPointer from a TermInfo.

Any tips on how to continue?

Paul Elschot

> Mike
> Paul Elschot wrote:
> > I'm looking at the for loop in at line 666,
> > which completely interprets the input positions/payloads for
> > an input term at a document.
> >
> > The positions/payloads don't change when they merged, is that
> > correct? I'm wondering whether this loop could be replaced by a
> > direct copy from
> > the input postings to proxOutput.
> >
> > Regards,
> > Paul Elschot
> >
> > -------------------------------------------------------------------
> >-- To unsubscribe, e-mail:
> > For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message