Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 56041 invoked from network); 30 Mar 2009 16:21:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 30 Mar 2009 16:21:18 -0000 Received: (qmail 18281 invoked by uid 500); 30 Mar 2009 16:21:18 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 18193 invoked by uid 500); 30 Mar 2009 16:21:17 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 18185 invoked by uid 99); 30 Mar 2009 16:21:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Mar 2009 16:21:17 +0000 X-ASF-Spam-Status: No, hits=-1999.8 required=10.0 tests=ALL_TRUSTED,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Mar 2009 16:21:11 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 8B5AE234C046 for ; Mon, 30 Mar 2009 09:20:50 -0700 (PDT) Message-ID: <1974604374.1238430050569.JavaMail.jira@brutus> Date: Mon, 30 Mar 2009 09:20:50 -0700 (PDT) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1575) Refactoring Lucene collectors (HitCollector and extensions) In-Reply-To: <4603413.1238172291560.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693822#action_12693822 ] Michael McCandless commented on LUCENE-1575: -------------------------------------------- bq. How's that sound: That sounds good! So to be consistent maybe we create ScoringTopFieldCollector and NonScoringTopFieldCollector? This means we don't need ScoreCacheScorer? (because ScoringTopFieldCollector will always grab the score). Though how do we change FieldComparator API so as to not pass score around? All comparators except RelevanceComparator don't use it. bq. Well, if we use ScoreCacheScorer, then this call is really fast, returning immediately and w/o computing the score. I'm actually torn on how fast this will be: I think that will be an if statement that's hard for the CPU to predict, which is costly. bq. So you suggest the methods on IndexSearcher today that take a Sort as parameter will default to NSTFC? As long as we document it it's ok? Are all of these new? Hmmm... actually, no, I think those must continue to use NSTFC for the existing methods (to remain back compatible), but add a new search method that takes a boolean trackScore? > Refactoring Lucene collectors (HitCollector and extensions) > ----------------------------------------------------------- > > Key: LUCENE-1575 > URL: https://issues.apache.org/jira/browse/LUCENE-1575 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Shai Erera > Fix For: 2.9 > > > This issue is a result of a recent discussion we've had on the mailing list. You can read the thread [here|http://www.nabble.com/Is-TopDocCollector%27s-collect()-implementation-correct--td22557419.html]. > We have agreed to do the following refactoring: > * Rename MultiReaderHitCollector to Collector, with the purpose that it will be the base class for all Collector implementations. > * Deprecate HitCollector in favor of the new Collector. > * Introduce new methods in IndexSearcher that accept Collector, and deprecate those that accept HitCollector. > ** Create a final class HitCollectorWrapper, and use it in the deprecated methods in IndexSearcher, wrapping the given HitCollector. > ** HitCollectorWrapper will be marked deprecated, so we can remove it in 3.0, when we remove HitCollector. > ** It will remove any instanceof checks that currently exist in IndexSearcher code. > * Create a new (abstract) TopDocsCollector, which will: > ** Leave collect and setNextReader unimplemented. > ** Introduce protected members PriorityQueue and totalHits. > ** Introduce a single protected constructor which accepts a PriorityQueue. > ** Implement topDocs() and getTotalHits() using the PQ and totalHits members. These can be used as-are by extending classes, as well as be overridden. > ** Introduce a new topDocs(start, howMany) method which will be used a convenience method when implementing a search application which allows paging through search results. It will also attempt to improve the memory allocation, by allocating a ScoreDoc[] of the requested size only. > * Change TopScoreDocCollector to extend TopDocsCollector, use the topDocs() and getTotalHits() implementations as they are from TopDocsCollector. The class will also be made final. > * Change TopFieldCollector to extend TopDocsCollector, and make the class final. Also implement topDocs(start, howMany). > * Change TopFieldDocCollector (deprecated) to extend TopDocsCollector, instead of TopScoreDocCollector. Implement topDocs(start, howMany) > * Review other places where HitCollector is used, such as in Scorer, deprecate those places and use Collector instead. > Additionally, the following proposal was made w.r.t. decoupling score from collect(): > * Change collect to accecpt only a doc Id (unbased). > * Introduce a setScorer(Scorer) method. > * If during collect the implementation needs the score, it can call scorer.score(). > If we do this, then we need to review all places in the code where collect(doc, score) is called, and assert whether Scorer can be passed. Also this raises few questions: > * What if during collect() Scorer is null? (i.e., not set) - is it even possible? > * I noticed that many (if not all) of the collect() implementations discard the document if its score is not greater than 0. Doesn't it mean that score is needed in collect() always? > Open issues: > * The name for Collector > * TopDocsCollector was mentioned on the thread as TopResultsCollector, but that was when we thought to call Colletor ResultsColletor. Since we decided (so far) on Collector, I think TopDocsCollector makes sense, because of its TopDocs output. > * Decoupling score from collect(). > I will post a patch a bit later, as this is expected to be a very large patch. I will split it into 2: (1) code patch (2) test cases (moving to use Collector instead of HitCollector, as well as testing the new topDocs(start, howMany) method. > There might be even a 3rd patch which handles the setScorer thing in Collector (maybe even a different issue?) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org