Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 12033 invoked from network); 21 Aug 2009 11:43:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Aug 2009 11:43:15 -0000 Received: (qmail 68976 invoked by uid 500); 21 Aug 2009 11:43:36 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 68908 invoked by uid 500); 21 Aug 2009 11:43:36 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 68900 invoked by uid 99); 21 Aug 2009 11:43:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Aug 2009 11:43:36 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Aug 2009 11:43:35 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D2E3A234C004 for ; Fri, 21 Aug 2009 04:43:14 -0700 (PDT) Message-ID: <237039929.1250854994854.JavaMail.jira@brutus> Date: Fri, 21 Aug 2009 04:43:14 -0700 (PDT) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader" In-Reply-To: <1931573201.1250629228444.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745911#action_12745911 ] Michael McCandless commented on LUCENE-1821: -------------------------------------------- OK... pondering this some more, and on seeing just how much change would be required, I'm now nervous about making deep changes to Lucene's scoring/filtering APIS (Weight.scorer, Filter.getDocIdSet) to enable access to top readers and/or a sub-readers doc base. All of Lucene's core & contrib now operates "context free" (per-segment), where each reader need not know its "context" in the full searcher tree, and I think we should strongly encourage external usage of these APIs to switch to context free as well. Since there are workarounds possible (accessing sub-readers via IndexSearcher), external apps that have problems making the switch can use these workarounds? > Weight.scorer() not passed doc offset for "sub reader" > ------------------------------------------------------ > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search > Affects Versions: 2.9 > Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a Scorer to know the "actual" doc id for the document's it matches (only the relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all segments), there is now no way to index into them properly from inside a Scorer because the scorer is not passed the needed offset to calculate the "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org