lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Could positions/payloads in SegmentMerger be copied directly?
Date Sun, 21 Sep 2008 12:57:12 GMT

This part is indeed quite tricky... I'll try to take a stab at it.

Paul Elschot wrote:

> Op Friday 19 September 2008 17:05:29 schreef Michael McCandless:
>> Not quite, because how positions are encoded depends on whether any
>> payload appeared in that segment.
>>
>> However, if 1) the input is a SegmentReader (since in general we can
>> merge any IndexReader), and 2) its format is "congruent" with the
>> format we are writing (ie both don't or do use the payloads format),
>> which ought to be true the vast majority of the time, then I think we
>> could simply copy bytes.  Since the next TermInfo tells us the
>> proxPointer where it begins, we know exactly how many bytes to copy.
>> I think this'd be a nice optimization!
>
> I tried to find a way to do this, but I'm stuck at the point where
> the proxPointer is needed from a TermInfo.
> I got this far (uncompiled code, smi is the SegmentMergeInfo
> that is currently merged):
>
>    if (smi.reader instanceof SegmentReader) {
>      SegmentReader inputReader = smi.reader;
>      boolean readerStorePayloads =
> inputReader.fieldInfos.fieldInfo(smi.term.field).storePayloads;
>      if (storePayloads == readerStorePayloads) {
>        // take the difference of the two prox pointers:
>        int positionsLength = inputReader.tis. ... -  ...;
>        // do a direct byte copy from inputReader to proxOutput:
>        ... ;
>      }
>    }
>
> but I could not find out how to get from the TermInfosReader
> at inputReader.tis to the next prox pointer.
>
> SegmentMerger never needs to index the positions by using a
> proxPointer itself, as it accesses all positions serially. This leaves
> me without an example on how to use proxPointer from a TermInfo.
>
> Any tips on how to continue?
>
> Regards,
> Paul Elschot
>
>
>> Mike
>>
>> Paul Elschot wrote:
>>> I'm looking at the for loop in SegmentMerger.java at line 666,
>>> which completely interprets the input positions/payloads for
>>> an input term at a document.
>>>
>>> The positions/payloads don't change when they merged, is that
>>> correct? I'm wondering whether this loop could be replaced by a
>>> direct copy from
>>> the input postings to proxOutput.
>>>
>>> Regards,
>>> Paul Elschot
>>>
>>> -------------------------------------------------------------------
>>> -- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message