Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BA8DBD354 for ; Thu, 8 Nov 2012 10:49:14 +0000 (UTC) Received: (qmail 32937 invoked by uid 500); 8 Nov 2012 10:49:14 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 32411 invoked by uid 500); 8 Nov 2012 10:49:13 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 32040 invoked by uid 99); 8 Nov 2012 10:49:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Nov 2012 10:49:13 +0000 Date: Thu, 8 Nov 2012 10:49:13 +0000 (UTC) From: "Uwe Schindler (JIRA)" To: dev@lucene.apache.org Message-ID: <1389205012.86307.1352371753462.JavaMail.jiratomcat@arcas> In-Reply-To: <390952330.86194.1352369592822.JavaMail.jiratomcat@arcas> Subject: [jira] [Comment Edited] (LUCENE-4548) BooleanFilter should optionally pass down further restricted acceptDocs in the MUST case (and acceptDocs in general) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-4548?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D134= 93109#comment-13493109 ]=20 Uwe Schindler edited comment on LUCENE-4548 at 11/8/12 10:48 AM: ----------------------------------------------------------------- I am talking about the contrib BooleanFilter (which is in my opinion a horr= ible class), FilteredQuery is not affected at all. =20 was (Author: thetaphi): I am talking about the contrib BooleanFilter, FilteredQuery is not affe= cted at all. =20 > BooleanFilter should optionally pass down further restricted acceptDocs i= n the MUST case (and acceptDocs in general) > -------------------------------------------------------------------------= ------------------------------------------- > > Key: LUCENE-4548 > URL: https://issues.apache.org/jira/browse/LUCENE-4548 > Project: Lucene - Core > Issue Type: Bug > Reporter: Uwe Schindler > > Spin-off from dev@lao: > {quote} > bq. I am about to write a Filter that only operates on a set of documents= that have already passed other filter(s). It's rather expensive, since it= has to use DocValues to examine a value and then determine if its a match.= So it scales O(n) where n is the number of documents it must see. The 2n= d arg of getDocIdSet is Bits acceptDocs. Unfortunately Bits doesn't have a= n int iterator but I can deal with that seeing if it extends DocIdSet. > bq. I'm looking at BooleanFilter which I want to use and I notice that it= passes null to filter.getDocIdSet for acceptDocs, and it justifies this wi= th the following comment: > bq. // we dont pass acceptDocs, we will filter at the end using an additi= onal filter > the idea of passing the already build bits for the MUST is a good idea an= d can be implemented easily. > The reason why the acceptDocs were not passed down is the new way of filt= er works in Lucene 4.0 and to optimize caching. Because accept docs are the= only thing that changes when deletions are applied and filters are require= d to handle them separately: whenever something is able to cache (e.g. Cac= hingWrapperFilter), the acceptDocs are not cached, so the underlying filter= s get a null acceptDocs to produce the full bitset and the filtering is don= e when CachingWrapperFilter gets the =E2=80=9Cuptodate=E2=80=9D acceptDocs.= But for this case this does not matter if the first filter clause does not= get acceptdocs, but later MUST clauses of course can get them (they are no= t deletion-specific)! > Can you open issue to optimize the MUST case (possibly MUST_NOT, too)? > Another thing that could help here: You can stop using BooleanFilter if y= ou can apply the filters sequentially (only MUST clauses) by wrapping with = multiple FilteredQuery: new FilteredQuery(new FilteredQuery(originalQuery, = clause1), clause2). If the DocIdSets enable bits() and the FilteredQuery au= todetection decides to use random access filters, the acceptdocs are also p= assed down from the outside to the inner, removing the documents filtered o= ut. > {quote} > Maybe BooleanFilter should have 2 modes (Boolean ctor argument): Passing = down the acceptDocs to every filter (for the case where Filter calculation = is expensive and accept docs help to limit the calculations) or not passing= down (if the filter is cheap and the multiple acceptDocs bit checks for ev= ery single filter is more expensive =E2=80=93 which is then more effective,= e.g. when the Filter is only a cached bitset). The first mode would also o= ptimize the MUST/MUST_NOT case to pass down the further restricted acceptDo= cs on later filters (just like FilteredQuery does). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org