Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 62807 invoked from network); 11 Feb 2010 13:30:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Feb 2010 13:30:45 -0000 Received: (qmail 80344 invoked by uid 500); 11 Feb 2010 13:30:42 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 80264 invoked by uid 500); 11 Feb 2010 13:30:42 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 80254 invoked by uid 99); 11 Feb 2010 13:30:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Feb 2010 13:30:42 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [209.85.217.212] (HELO mail-gx0-f212.google.com) (209.85.217.212) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Feb 2010 13:30:35 +0000 Received: by gxk4 with SMTP id 4so645548gxk.5 for ; Thu, 11 Feb 2010 05:30:14 -0800 (PST) MIME-Version: 1.0 Received: by 10.151.92.5 with SMTP id u5mr186631ybl.114.1265895014690; Thu, 11 Feb 2010 05:30:14 -0800 (PST) In-Reply-To: <20100210194227.GA18842@rectangular.com> References: <4B716C5F.6020907@deri.org> <4B718EEB.3070605@deri.org> <9ac0c6aa1002090851v1183e96fybbcd61f1a277b00a@mail.gmail.com> <20100209181235.GA15349@rectangular.com> <9ac0c6aa1002091247j36033b9bt65c787d8703d371e@mail.gmail.com> <20100209214416.GA17405@rectangular.com> <9ac0c6aa1002100358v5e51c743q65054b81e9da5067@mail.gmail.com> <20100210132743.GA13836@rectangular.com> <9ac0c6aa1002100933g2f97801fy6c3e7b0064fe4ddf@mail.gmail.com> <20100210194227.GA18842@rectangular.com> Date: Thu, 11 Feb 2010 08:30:14 -0500 Message-ID: <9ac0c6aa1002110530k756d0f57rdd5aad89430ca716@mail.gmail.com> Subject: Re: Flex & Docs/AndPositionsEnum From: Michael McCandless To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 On Wed, Feb 10, 2010 at 2:42 PM, Marvin Humphrey wrote: > On Wed, Feb 10, 2010 at 12:33:27PM -0500, Michael McCandless wrote: > >> In Lucene, skipping is done through the aggregator, > > I had a look at MultiDocsEnum in the flex blanch. It doesn't know when > sub-enum is reading skip data. I'm confused -- the MultiDocsEnum's advance method impl is the only place where we invoke advance on the sub readers. Oh you're saying we don't know if the underlying enum actually skipped vs just scanned? Isn't the skip data also based on deltas? So even if real skipping happened, Lucy/KS would not "lose" the offset that the aggregator had previously added? Or maybe I'm lost on what the issue is here... >> > I suppose another possibility would have been to have the aggregator >> > keep its own Posting and copy all data over from the >> > SegPostingList's Posting on each iteration then add its offset. >> >> I think this is what Lucene does (?). EG the aggregator holds its own >> "int doc" which it must copy to (adding the offset) from the >> underlying sub enum. > > That's fine for a *primitive* type. Modifying an int returned by a sub-enum > doesn't affect the sub-enum. :) > > The problem arises when there's an opaque *object* conveying data to the > consumer. The aggregator knows everything there is to know about an int, but > it doesn't know what it needs to do to prepare an opaque object owned by the > sub-enum for consumption at the aggregate level. OK. >> > However, that would have been a lot less efficient, and it still >> > wouldn't have worked for the "flat positions space" example because >> > the generic aggregator would not have known about the needs of the >> > specific codec. >> >> But aggregator could also add the positions offset on each >> nextPosition() call, in Lucene. Like that use case could be made to >> work, if Lucene had used a flat position space. > > A generic aggregator wouldn't know that it needed to do that. The postings > codec developer would be forced to write aggregation code in addition to > segment-level code. Right, if position were not primitive but contained within an opaque (to the aggregator) object. And, you were doing the flat positions space. I guess... this restriction still seems academic... ie, not a real issue in Lucene. We use primitives in Lucene for doc/position, which we can remap as needed. We then require that opaque stuff (using attributes) "survive", unchanged, when passed through the aggregator. Either that, or, you enum segment by segment in the code. I don't [yet] see this as an issue for Lucene... Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org