lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: Flex & Docs/AndPositionsEnum
Date Wed, 10 Feb 2010 13:27:43 GMT
On Wed, Feb 10, 2010 at 06:58:01AM -0500, Michael McCandless wrote:
> But why didn't you have the Multi*Enums layer add the offset (so that
> the codec need not know who's consuming it)?  Performance?

That would have involved something like this within the aggregator:
    posting.setDocID(posting.getDodID() + docBase).

The problem is that that's the docID the SegPostingList is using for its
deltas.  If the SegPostingList skips during a call to advance(), it needs to
reset that docID to the what the skip data says -- but if the aggregator layer
doesn't tell it that it needs to account for a docBase, the new docID will
lose the offset.  Can't solve that problem at the aggregator level either --
the aggregator doesn't know when skipping is occurring, so it can't intervene
on an as-needed basis. 

The fix was to make SegPostingList aware of a docBase, so that on skipping it
could add it to the docID in the skip data and land at the right docID from
the perspective of the consumer.  Messy.

I suppose another possibility would have been to have the aggregator keep its
own Posting and copy all data over from the SegPostingList's Posting on each
iteration then add its offset.  However, that would have been a lot less
efficient, and it still wouldn't have worked for the "flat positions space"
example because the generic aggregator would not have known about the needs of
the specific codec.

> > That example may not be a deal breaker for you, but I'm not willing
> > to guarantee that Lucy will always return primitives from these
> > enums, now and forever, one per method call.
> But it'd be a major API change down the road to change this, for
> Lucy/KS?  

I suppose so.  It's either foreclose on the possibility of aggregating (Lucy),
or foreclose on the possibility of using properties that cannot be aggregated

> Also, this is why we're adding Attribute* to all the postings enums,
> with flex -- any codec & consumer can use their own private
> attributes.  The attrs pass through Multi*Enum.

Hmm.  Does that mean that the consumer needs to refresh the attributes with
each iteration?  Because what happens when you switch sub-enums within the
Multi*Enum?  Don't those attributes go stale, as they belong to a sub-enum
that has finished?

Marvin Humphrey

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message