Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7A236ED95 for ; Fri, 1 Mar 2013 12:57:14 +0000 (UTC) Received: (qmail 98506 invoked by uid 500); 1 Mar 2013 12:57:12 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 98300 invoked by uid 500); 1 Mar 2013 12:57:11 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 98283 invoked by uid 99); 1 Mar 2013 12:57:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Mar 2013 12:57:11 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of uwe@thetaphi.de designates 188.138.97.18 as permitted sender) Received: from [188.138.97.18] (HELO mail.sd-datasolutions.de) (188.138.97.18) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Mar 2013 12:57:01 +0000 Received: from VEGA (gate1.marum.de [134.102.237.1]) by mail.sd-datasolutions.de (Postfix) with ESMTPSA id 72F4614AA051 for ; Fri, 1 Mar 2013 12:56:40 +0000 (UTC) From: "Uwe Schindler" To: References: <001101cdf9c2$81b8dc80$852a9580$@thetaphi.de> <1359011456523-4035870.post@n3.nabble.com> <1359067051103-4036093.post@n3.nabble.com> <1362001158940-4043488.post@n3.nabble.com> <005301ce1536$29a109c0$7ce31d40$@thetaphi.de> <1362005457192-4043497.post@n3.nabble.com> <000001ce153e$061506c0$123f1440$@thetaphi.de> <1362075934566-4043719.post@n3.nabble.com> <00f201ce15e2$2e34fda0$8a9ef8e0$@thetaphi.de> <1362088497045-4043788.post@n3.nabble.com> <014601ce15ff$b2d3c5b0$187b5110$@thetaphi.de> <5130A1EE.1060209@safaribooksonline.com> In-Reply-To: <5130A1EE.1060209@safaribooksonline.com> Subject: RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile) Date: Fri, 1 Mar 2013 13:56:39 +0100 Message-ID: <003201ce167c$378f0a40$a6ad1ec0$@thetaphi.de> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQJMLo5ircovcF2by7Z7/Z1Z3AMQeQKLhfc/ATZHhU4B8+yWuQLjt/J+AVTJFV4B+uRwMQJqrxDmApaFedYCMBVBQgHwC5KRAixgTzECrbZmCpbFdcjQ Content-Language: de X-Virus-Checked: Checked by ClamAV on apache.org The slowdown happens not on making the doc ids absolute (it is just an = addition), the slowdown appears when you retrieve the stored fields on = the top-level reader (because the composite top-level reader has to do a = binary search in the reader tree to find the correct reader). This = answer was related to the code pasted by the user asking this question. If you need top-level doc ids because you present the global doc-ids to = the user (e.g. this is how TopScoreDocCollector works), you can of = course add the doc base. But inside the collector it makes absolutely no = sense to transform the local and relative doc ids to absolute ones just = to call a method on a top-level reader that needs to do the opposite = with a binary search. In that case, use the AtomicReader directly. If = you also access FieldCache, working with absolute doc-ids also brings in = waste of megabytes of memory and FieldCache insanity. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: Michael Sokolov [mailto:msokolov@safaribooksonline.com] > Sent: Friday, March 01, 2013 1:41 PM > To: java-user@lucene.apache.org > Cc: Uwe Schindler > Subject: Re: TopDocCollector vs TopScoreDocCollector (semantics = changed in > 4.0, not backward comptabile) >=20 > On 2/28/2013 5:05 PM, Uwe Schindler wrote: > > ... Collector instead of HitCollector (like your ancient Lucene = from 2.4), you > have to respect the new semantics that are *different* to old = HitCollector. > Collector works with low-level atomic readers (also in Lucene 3.x), = the calls to > the "collect(int)" method are *not* using global document IDs, so = using a > IndexReader from outside does not work and will never work - PERIOD: = The > document IDs are only *relative* to the atomic reader that was passed = to > the collector by setNextReader() before a sequence of collect() calls. = To > make global docIds out of it, you may use readerContext.docBase, but = this is > slower than using the low-level atomic reader. > > > Uwe, thanks for this lucid explanation! I wonder if you wouldn't mind > elaborating a bit on the slowdown you refer to from using docBase to > absolutize docIDs. I have a use case where I need to pass control to = my > caller, allowing them to *pull* results - so I don't know how many I = will need. > In the case where documents are returned in(docID) order, the code is > actually pretty straightforward: I iterate over the atomic readers and = pull > results from each in turn. Are you saying that is slower because it = prevents > multi-threading, or is there some other reason? >=20 > -Mike >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org