Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 36927 invoked from network); 30 Apr 2009 10:05:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 30 Apr 2009 10:05:06 -0000 Received: (qmail 63933 invoked by uid 500); 30 Apr 2009 10:05:05 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 63848 invoked by uid 500); 30 Apr 2009 10:05:05 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 63830 invoked by uid 99); 30 Apr 2009 10:05:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Apr 2009 10:05:01 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Apr 2009 10:04:52 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E443F234C003 for ; Thu, 30 Apr 2009 03:04:30 -0700 (PDT) Message-ID: <183255735.1241085870920.JavaMail.jira@brutus> Date: Thu, 30 Apr 2009 03:04:30 -0700 (PDT) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean In-Reply-To: <848873297.1240634970628.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704556#action_12704556 ] Michael McCandless commented on LUCENE-1614: -------------------------------------------- bq. Just to clarify for myself, in the example I gave above, suppose thar the scorer is on "3" and you call check(8). On check(8), TermScorer would go to 10, stop there, and return false. (It would not "rewind" to 3). Check can only be called on increasing arguments, so it's not truly "random access". It's "forward only random access". bq. You propose this check() so that in case a DISI can save any extra operations it does in next() (such as reading a payload for example) it will do so. Therefore in the example you give above with CS, next()'s contract forces it to advance all the sub-scorers, but with check() it could stop in the middle. Precisely. This is important when you have a super-cheap iterator (say a somewhat sparse (<=10%?) in-memory filter that's represented as list-of-docIDs). It's very fast for such a filter to iterate over its docIDs. But when that iterator is AND'd with a Scorer, as is done today by IndexSearcher, they effectively play "leap frog", where first it's the filter's turn to next(), then it's the Scorer's turn, etc. But for the Scorer, next() can be extremely costly, only to find the filter doesn't accept it. So for such situations it's better to let the filter drive the search, calling Scorer.check() on the docs. But... once we switch to filter-as-BooleanClause, it's less clear whether check() is worthwhile, because I think the filter's constraint is more efficiently taken into account. For filters that support random access (if they are less sparse, say >= 25% or so), we should push them all the way down to the TermScorers and factor them in just like deletedDocs. bq. . If the default impl in DISI just uses nextDoc() and returns true if the return value is the requested, we should be safe back-compat-wise, but this is still dangerous and we need clear documentation. Yes it does have a good default impl, I think. bq. BTW, perhaps a testAndSet-like version can save check(10) followed by a next(10), and will fit nicer? Not sure what you mean by "testAndSet-like version"? > Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean > ---------------------------------------------------------------------------------------------------- > > Key: LUCENE-1614 > URL: https://issues.apache.org/jira/browse/LUCENE-1614 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Shai Erera > Fix For: 2.9 > > > See http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html for the full discussion. The basic idea is to add variants to those two methods that return the current doc they are at, to save successive calls to doc(). If there are no more docs, return -1. A summary of what was discussed so far: > # Deprecate those two methods. > # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI (calls next() and skipTo() respectively, and will be changed to abstract in 3.0). > #* I actually would like to propose an alternative to the names: advance() and advance(int) - the first advances by one, the second advances to target. > # Wherever these are used, do something like '(doc = advance()) >= 0' instead of comparing to -1 for improved performance. > I will post a patch shortly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org