Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 18127 invoked from network); 16 Jun 2009 18:50:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Jun 2009 18:50:18 -0000 Received: (qmail 69435 invoked by uid 500); 16 Jun 2009 18:37:30 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 69399 invoked by uid 500); 16 Jun 2009 18:37:30 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 69357 invoked by uid 99); 16 Jun 2009 18:37:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jun 2009 18:37:29 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jun 2009 18:37:27 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id A6D6D29A0012 for ; Tue, 16 Jun 2009 11:37:07 -0700 (PDT) Message-ID: <1938567170.1245177427682.JavaMail.jira@brutus> Date: Tue, 16 Jun 2009 11:37:07 -0700 (PDT) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness In-Reply-To: <772749480.1241640750498.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720280#action_12720280 ] Michael McCandless commented on LUCENE-1630: -------------------------------------------- Patch looks good! * There are some new javadoc warnings (wrong links) * Maybe, in changes, add that BooleanQuery will now score docs out of order when used with a Collector that can accept docs out of order? Ie this is a good ootb perf gain for or queries, sorting by field or score. . Oh, ugh, it seems we don't actually do that today (because we respect the static setting). Hmm. Can't we change that static default (BooleanQuery.allowDocsOutOfOrder) to true? Because, our core collectors can handle it, and any custom collector will by default say it cannot handle it so there wouldn't be a break in back compat? * That's a nice cleanup, seeing BooleanQuery decide which scorer impl to return :) * We don't need the " = null" initializer on BooleanScorer2.countingSumScorer's decl * Can we make Collector.supportsDocsOutOfOrder abstract? Defaulting to false isn't great (I'd rather subclass think about the question). * Can we rename Collector.supportsDocsOutOfOrder -> acceptsDocsOutOfOrder? * Can we also make QueryWeight.scoresOutOfOrder abstract? * CustomScoreQuery.scoresOutOfOrder should only look at its subQueryWeight? * If a given Scorer.scoresOutOfOrder returns true, does that mean nextDoc is allowed to return docs out of order? How is advance defined for such scorers? (BooleanScore does this, and its advance throws UOE). Maybe we allow this but advance may not work for such scorers? * Should scoresOutOfOrder() move from QueryWeight --> Scorer? I may call QueryWeight.scorer with scoreDocsInOrder=false, but many will in fact return a scorer that does score docs in order (eg BQ does this depending on what kind fo clauses, and how many, it has), and we could then pick a faster collector in such cases? * Shouldn't DisjunctionMaxQuery pass true for scoreDocsInOrder when asking its sub-queries for their scorers? Ie even if its callee allow out-of-order scoring, it requires in-order of its children? * I think DisjunctionMaxQuery.scoresOutOfOrder should simply return false? * Actually I think the way to factor the static setting in is backwards? Shouldn't it be {{scoreDocsInOrder |= !allowDocsOutOfOrder}}? * Can you sharpen the javadocs for boolean topScorer param? Ie, "if true, score(Collector) will be called; if false, nextDoc/advance will be called". (I found myself momentarily wondering if DocumentWriter's usage of Scorer API was a topScorer). * Shouldn't Searchable cutover to QueryWeight too? (We are keeping Searchable, but allowing changes to it) * Nice catch, removing Searchable's now-wrong NOTE about scoring when sorting by field! > Mating Collector and Scorer on doc Id orderness > ----------------------------------------------- > > Key: LUCENE-1630 > URL: https://issues.apache.org/jira/browse/LUCENE-1630 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Shai Erera > Assignee: Michael McCandless > Fix For: 2.9 > > Attachments: LUCENE-1630.patch, LUCENE-1630.patch > > > This is a spin off of LUCENE-1593. This issue proposes to expose appropriate API on Scorer and Collector such that one can create an optimized Collector based on a given Scorer's doc-id orderness and vice versa. Copied from LUCENE-1593, here is the list of changes: > # Deprecate Weight and create QueryWeight (abstract class) with a new scorer(reader, scoreDocsInOrder), replacing the current scorer(reader) method. QueryWeight implements Weight, while score(reader) calls score(reader, false /* out-of-order */) and scorer(reader, scoreDocsInOrder) is defined abstract. > #* Also add QueryWeightWrapper to wrap a given Weight implementation. This one will also be deprecated, as well as package-private. > #* Add to Query variants of createWeight and weight which return QueryWeight. For now, I prefer to add a default impl which wraps the Weight variant instead of overriding in all Query extensions, and in 3.0 when we remove the Weight variants - override in all extending classes. > # Add to Scorer isOutOfOrder with a default to false, and override in BS to true. > # Modify BooleanWeight to extend QueryWeight and implement the new scorer method to return BS2 or BS based on the number of required scorers and setAllowOutOfOrder. > # Add to Collector an abstract _acceptsDocsOutOfOrder_ which returns true/false. > #* Use it in IndexSearcher.search methods, that accept a Collector, in order to create the appropriate Scorer, using the new QueryWeight. > #* Provide a static create method to TFC and TSDC which accept this as an argument and creates the proper instance. > #* Wherever we create a Collector (TSDC or TFC), always ask for out-of-order Scorer and check on the resulting Scorer isOutOfOrder(), so that we can create the optimized Collector instance. > # Modify IndexSearcher to use all of the above logic. > The only class I'm worried about, and would like to verify with you, is Searchable. If we want to deprecate all the search methods on IndexSearcher, Searcher and Searchable which accept Weight and add new ones which accept QueryWeight, we must do the following: > * Deprecate Searchable in favor of Searcher. > * Add to Searcher the new QueryWeight variants. Here we have two choices: (1) break back-compat and add them as abstract (like we've done with the new Collector method) or (2) add them with a default impl to call the Weight versions, documenting these will become abstract in 3.0. > * Have Searcher extend UnicastRemoteObject and have RemoteSearchable extend Searcher. That's the part I'm a little bit worried about - Searchable implements java.rmi.Remote, which means there could be an implementation out there which implements Searchable and extends something different than UnicastRemoteObject, like Activeable. I think there is very small chance this has actually happened, but would like to confirm with you guys first. > * Add a deprecated, package-private, SearchableWrapper which extends Searcher and delegates all calls to the Searchable member. > * Deprecate all uses of Searchable and add Searcher instead, defaulting the old ones to use SearchableWrapper. > * Make all the necessary changes to IndexSearcher, MultiSearcher etc. regarding overriding these new methods. > One other optimization that was discussed in LUCENE-1593 is to expose a topScorer() API (on Weight) which returns a Scorer that its score(Collector) will be called, and additionally add a start() method to DISI. That will allow Scorers to initialize either on start() or score(Collector). This was proposed mainly because of BS and BS2 which check if they are initialized in every call to next(), skipTo() and score(). Personally I prefer to see that in a separate issue, following that one (as it might add methods to QueryWeight). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org