lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Filtering question
Date Wed, 11 Mar 2015 19:07:18 GMT
Hi,

BooleanQuery:
-- Clause 1: TermQuery
-- Clause 2: FilteredQuery
----- Branch 1: MatchAllDocsQuery()
----- Branch 2: MyNDVFilter


Why does it look like this? Clause 2 should simply be: ConstantScoreQuery(MyNDVFilter)
In that case the BooleanQuery will execute more effectively, in case of 2 MUST clauses it
will leap-frog.

The reason for this behavior is the way how FilteredQuery executes: A filter is seen as cheap,
so it is applied down low. If it supports Bits() access (instead of an iterator), it will
be passed as acceptDocs to the query (a MatchAllDocsQuery).

If you also apply the TermsFilter on the top level IndexSearcher (which internally rewrites
to FilteredQuery(query, filter)), the documents matching the TermsFilter will be applied as
acceptDocs by your BooleanQuery, which will pass it also down to the MyNDVFilter.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Chris Bamford [mailto:chris@chrisbamford.plus.com]
> Sent: Wednesday, March 11, 2015 6:39 PM
> To: java-user@lucene.apache.org
> Subject: Re: Filtering question
> 
> Additional -
> I'm on lucene 4.10.2
> 
> If I use a BooleanFilter as per Ian's suggestion I still get a null acceptDocs
> being passed to my NDV filter.
> 
> 
> Sent from my iPhone
> 
> > On 11 Mar 2015, at 17:19, Chris Bamford <chris@bammers.net> wrote:
> >
> > Hi Shai
> >
> > I thought that might be what acceptDocs was for, but in my case it is null
> and throws a NPE if I try your suggestion.
> >
> > What am I doing wrong? I'd like to really understand this stuff ..
> >
> > Thanks
> >
> > Chris
> >
> >
> >> On 11 Mar 2015, at 13:05, Shai Erera <serera@gmail.com> wrote:
> >>
> >> I don't see that you use acceptDocs in your MyNDVFilter. I think it
> >> would return false for all userB docs, but you should confirm that.
> >>
> >> Anyway, because you use an NDV field, you can't automatically skip
> >> unrelated documents, but rather your code would look something like:
> >>
> >> for (int i = 0; i < reader.maxDoc(); i++) { if (!acceptDocs.get(i)) {
> >>   continue;
> >> }
> >> // document is accepted, read values
> >> ...
> >> }
> >>
> >> Shai
> >>
> >>> On Wed, Mar 11, 2015 at 1:25 PM, Ian Lea <ian.lea@gmail.com> wrote:
> >>>
> >>> Can you use a BooleanFilter (or ChainedFilter in 4.x) alongside your
> >>> BooleanQuery?   Seems more logical and I suspect would solve the
> problem.
> >>> Caching filters can be good too, depending on how often your data
> changes.
> >>> See CachingWrapperFilter.
> >>>
> >>> --
> >>> Ian.
> >>>
> >>>
> >>> On Tue, Mar 10, 2015 at 12:45 PM, Chris Bamford
> >>> <cbamford@mimecast.com>
> >>> wrote:
> >>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> I have an index of 30 docs, 20 of which have an owner field of "UserA"
> >>>> and 10 of "UserB".
> >>>> I also have a query which consists of:
> >>>>
> >>>> BooleanQuery:
> >>>> -- Clause 1: TermQuery
> >>>> -- Clause 2: FilteredQuery
> >>>> ----- Branch 1: MatchAllDocsQuery()
> >>>> ----- Branch 2: MyNDVFilter
> >>>>
> >>>> I execute my search as follows:
> >>>>
> >>>> searcher.search( booleanQuery,
> >>>>                                   new TermFilter(new Term("owner",
> >>>> "UserA"),
> >>>>                                   50);
> >>>>
> >>>> The TermFilter's job is to reduce the number of searchable
> >>>> documents from 30 to 20, which it does for all clauses of the
> >>>> BooleanQuery except
> >>> for
> >>>> MyNDVFilter which iterates through the full 30 docs, 10 needlessly.
> >>>> How can I restrict it so it behaves the same as the other query
> branches?
> >>>>
> >>>> MyNDVFilter source code:
> >>>>
> >>>> public class MyNDVFilter extends Filter {
> >>>>
> >>>>    private String fieldName;
> >>>>   private String matchTag;
> >>>>
> >>>>    public TagFilter(String ndvFieldName, String matchTag) {
> >>>>       this.fieldName = ndvFieldName;
> >>>>       this.matchTag = matchTag;
> >>>>   }
> >>>>
> >>>>    @Override
> >>>>   public DocIdSet getDocIdSet(AtomicReaderContext context, Bits
> >>>> acceptDocs) throws IOException {
> >>>>
> >>>>        AtomicReader reader = context.reader();
> >>>>       int maxDoc = reader.maxDoc();
> >>>>       final FixedBitSet bitSet = new FixedBitSet(maxDoc);
> >>>>       BinaryDocValues ndv = reader.getBinaryDocValues(fieldName);
> >>>>
> >>>>        if (ndv != null) {
> >>>>           for (int i = 0; i < maxDoc; i++) {
> >>>>               BytesRef br = ndv.get(i);
> >>>>               if (br.length > 0) {
> >>>>                   String strval = br.utf8ToString();
> >>>>                   if (strval.equals(matchTag)) {
> >>>>                       bitSet.set(i);
> >>>>                       System.out.println("MyNDVFilter >> " +
> >>>> matchTag + " matched " + i + " [" + strval + "]");
> >>>>                   }
> >>>>               }
> >>>>           }
> >>>>       }
> >>>>
> >>>>        return new DVDocSetId(bitSet);    // just wraps a FixedBitSet
> >>>>   }
> >>>> }
> >>>>
> >>>>
> >>>>
> >>>> Chris Bamford m: +44 7860 405292  w: www.mimecast.com  Senior
> >>> Developer p:
> >>>> +44 207 847 8700 Address click here
> >>>> <http://www.mimecast.com/About-us/Contact-us/>
> >>>> ------------------------------
> >>>> [image: http://www.mimecast.com]
> >>>> <
> >>>
> https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=83be6
> >>> 74748892bc34425eb4133af3e68
> >>>>
> >>>> [image: LinkedIn]
> >>>> <
> >>>
> https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=83a78
> >>> f78bdfa40c471501ae0b813a68f>
> >>> [image:
> >>>> YouTube]
> >>>> <
> >>>
> https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=ad1ed
> >>> 1af5bb9cf9dc965267ed43faff0>
> >>> [image:
> >>>> Facebook]
> >>>> <
> >>>
> https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=172d4
> >>> ea57e4a4673452098ba62badace>
> >>> [image:
> >>>> Blog]
> >>>> <
> >>>
> https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=871b3
> >>> 0b627b3263b9ae2a8f37b0de5ff>
> >>> [image:
> >>>> Twitter]
> >>>> <
> >>>
> https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=cc3a8
> >>> 25e202ee26a108f3ef8a1dc3c6f
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message