Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 29315 invoked from network); 21 Mar 2009 09:51:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Mar 2009 09:51:41 -0000 Received: (qmail 5042 invoked by uid 500); 21 Mar 2009 09:51:35 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 4985 invoked by uid 500); 21 Mar 2009 09:51:34 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 4976 invoked by uid 99); 21 Mar 2009 09:51:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Mar 2009 02:51:34 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.92.26] (HELO qw-out-2122.google.com) (74.125.92.26) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Mar 2009 09:51:26 +0000 Received: by qw-out-2122.google.com with SMTP id 8so699945qwh.53 for ; Sat, 21 Mar 2009 02:51:04 -0700 (PDT) Received: by 10.224.37.78 with SMTP id w14mr6866447qad.117.1237629064039; Sat, 21 Mar 2009 02:51:04 -0700 (PDT) Received: from ?10.17.4.4? (pool-173-48-164-75.bstnma.fios.verizon.net [173.48.164.75]) by mx.google.com with ESMTPS id 5sm5942243ywd.9.2009.03.21.02.51.02 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 21 Mar 2009 02:51:03 -0700 (PDT) Message-Id: From: Michael McCandless To: java-dev@lucene.apache.org In-Reply-To: <786fde50903202135m4a5dae62od996b1e3825c43b8@mail.gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Subject: Re: Is TopDocCollector's collect() implementation correct? Date: Sat, 21 Mar 2009 05:51:01 -0400 References: <786fde50903202135m4a5dae62od996b1e3825c43b8@mail.gmail.com> X-Mailer: Apple Mail (2.930.3) X-Virus-Checked: Checked by ClamAV on apache.org I think I'd lean towards a third solution: tighten up TopScoreDocCollector (make it final, remove ability to change its PQ, make things private) and have it focus on high performance collection by score. This is the default collector for Lucene searches so I think keeping it high performance (not adding additional method calls nor boolean checks) is important. You choose this for performance Then introduce a new hit collector that makes subclassing / changing the PQ easier. You choose this one for flexibility. And 2.9 is a great chance to do this since we've deprecated the old class (TopDocCollector) in favor of TopScoreDocCollector. Mike Shai Erera wrote: > Thanks Chris/Hoss (not sure who sent the original reply). > > I don't like calling pq.lessThan, as pq.insert and > pq.insertWithOverflow call it anyway internally and since it would > add a method call (something that was tried to be avoided in the > current implementation), I prefer the code I proposed below. > > BTW, I introduced 1356, so I take full responsibility on this > overlooking. The main reason for 1356 was to allow creating > extensions of TopDocCollector so they can be of the same type, and > share the topDocs() and totalHIts() implementations. > > I can file an issue. Any other comments? > > Shai > > On Sat, Mar 21, 2009 at 3:48 AM, Chris Hostetter > wrote: > > (resending msg from earlier today during @apache mail outage -- i > didn't get a copy from the list, so i'm assuming no one did) > > > ---------- Forwarded message ---------- > Date: Fri, 20 Mar 2009 15:29:13 -0700 (PDT) > > : TopDocCollector's (TDC) implementation of collect() seems a bit > problematic > : to me. > > This code isn't an area i'm very familiar with, but your assessment > seems > correct ... it looks like when LUCENE-1356 introduced the ability to > provide a PriorityQueue to the constructor, the existing > optimization when > the score was obvoiusly too low was overlooked. > > It looks like this same bug got propogated to TopScoreDocCollector > when it was introduced as well. > > : Introduce in TDC a private boolean which signals whether the > default PQ is > > : used or not. If it's not used, don't do the 'else if' at all. If > it is used, > : then the 'else if' is safe. Then code could look like: > > my vote would just be to change the ">=" comarison to a hq.lessThan > call > ... but i can understand how your proposal might be more efficient > -- I'll > let the performance experts fight it out ... but i definitely think > you > should fil a bug. > > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org