Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C70F910F54 for ; Wed, 11 Mar 2015 13:09:00 +0000 (UTC) Received: (qmail 34091 invoked by uid 500); 11 Mar 2015 13:08:58 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 34036 invoked by uid 500); 11 Mar 2015 13:08:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 34022 invoked by uid 99); 11 Mar 2015 13:08:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Mar 2015 13:08:58 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of serera@gmail.com designates 209.85.217.174 as permitted sender) Received: from [209.85.217.174] (HELO mail-lb0-f174.google.com) (209.85.217.174) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Mar 2015 13:08:32 +0000 Received: by lbiz12 with SMTP id z12so8623682lbi.12 for ; Wed, 11 Mar 2015 06:06:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=0QxxqO+bWKB6gERujJDoQppJBSqENektHhnhUKgy/W4=; b=KmrmQmOQVto5rQZTLVl0xciDh7aneaf5bzvEuT72e/jxtpQ/71LrDaenWeSWoO+eJ0 QdEbx1fVStlD+GOd9vj9MeW0daddaMuNzLMv5PyODwlZpFE7OtoJKofAc8aiLLSm+NR+ SzwkDIW2+D+Vff3fP5c2V3n/pLuRtoFb79tBpoBZ6niflx05NrnXQXPS6mFL7PnYPCq0 2+Pyy6+qXtuLbpRU06fnsKEtxzelQwm0A/7EG22DbOHjkvyvWk4heyYEEfWVdUjiiAJb mkPQGufQ9S8+LOMVT4jC1wVMq489h/Wbsf7W9UIeUuEKuyYXgBpoLArnYg2x539f4GEr 27SA== X-Received: by 10.112.170.100 with SMTP id al4mr34930260lbc.42.1426079176188; Wed, 11 Mar 2015 06:06:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.205.169 with HTTP; Wed, 11 Mar 2015 06:05:55 -0700 (PDT) In-Reply-To: References: <9A708406-80E2-4F97-A8F6-1D77ADC69628@mimecast.com> From: Shai Erera Date: Wed, 11 Mar 2015 15:05:55 +0200 Message-ID: Subject: Re: Filtering question To: "java-user@lucene.apache.org" Content-Type: multipart/alternative; boundary=001a11c23686b7e828051102ee56 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c23686b7e828051102ee56 Content-Type: text/plain; charset=UTF-8 I don't see that you use acceptDocs in your MyNDVFilter. I think it would return false for all userB docs, but you should confirm that. Anyway, because you use an NDV field, you can't automatically skip unrelated documents, but rather your code would look something like: for (int i = 0; i < reader.maxDoc(); i++) { if (!acceptDocs.get(i)) { continue; } // document is accepted, read values ... } Shai On Wed, Mar 11, 2015 at 1:25 PM, Ian Lea wrote: > Can you use a BooleanFilter (or ChainedFilter in 4.x) alongside your > BooleanQuery? Seems more logical and I suspect would solve the problem. > Caching filters can be good too, depending on how often your data changes. > See CachingWrapperFilter. > > -- > Ian. > > > On Tue, Mar 10, 2015 at 12:45 PM, Chris Bamford > wrote: > > > > > Hi, > > > > I have an index of 30 docs, 20 of which have an owner field of "UserA" > > and 10 of "UserB". > > I also have a query which consists of: > > > > BooleanQuery: > > -- Clause 1: TermQuery > > -- Clause 2: FilteredQuery > > ----- Branch 1: MatchAllDocsQuery() > > ----- Branch 2: MyNDVFilter > > > > I execute my search as follows: > > > > searcher.search( booleanQuery, > > new TermFilter(new Term("owner", > > "UserA"), > > 50); > > > > The TermFilter's job is to reduce the number of searchable documents > > from 30 to 20, which it does for all clauses of the BooleanQuery except > for > > MyNDVFilter which iterates through the full 30 docs, 10 needlessly. How > > can I restrict it so it behaves the same as the other query branches? > > > > MyNDVFilter source code: > > > > public class MyNDVFilter extends Filter { > > > > private String fieldName; > > private String matchTag; > > > > public TagFilter(String ndvFieldName, String matchTag) { > > this.fieldName = ndvFieldName; > > this.matchTag = matchTag; > > } > > > > @Override > > public DocIdSet getDocIdSet(AtomicReaderContext context, Bits > > acceptDocs) throws IOException { > > > > AtomicReader reader = context.reader(); > > int maxDoc = reader.maxDoc(); > > final FixedBitSet bitSet = new FixedBitSet(maxDoc); > > BinaryDocValues ndv = reader.getBinaryDocValues(fieldName); > > > > if (ndv != null) { > > for (int i = 0; i < maxDoc; i++) { > > BytesRef br = ndv.get(i); > > if (br.length > 0) { > > String strval = br.utf8ToString(); > > if (strval.equals(matchTag)) { > > bitSet.set(i); > > System.out.println("MyNDVFilter >> " + matchTag + > > " matched " + i + " [" + strval + "]"); > > } > > } > > } > > } > > > > return new DVDocSetId(bitSet); // just wraps a FixedBitSet > > } > > } > > > > > > > > Chris Bamford m: +44 7860 405292 w: www.mimecast.com Senior > Developer p: > > +44 207 847 8700 Address click here > > > > ------------------------------ > > [image: http://www.mimecast.com] > > < > https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=83be674748892bc34425eb4133af3e68 > > > > [image: LinkedIn] > > < > https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=83a78f78bdfa40c471501ae0b813a68f> > [image: > > YouTube] > > < > https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=ad1ed1af5bb9cf9dc965267ed43faff0> > [image: > > Facebook] > > < > https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=172d4ea57e4a4673452098ba62badace> > [image: > > Blog] > > < > https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=871b30b627b3263b9ae2a8f37b0de5ff> > [image: > > Twitter] > > < > https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=cc3a825e202ee26a108f3ef8a1dc3c6f > > > > > > > --001a11c23686b7e828051102ee56--