lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bamford <ch...@bammers.net>
Subject Re: Filtering question
Date Wed, 11 Mar 2015 17:19:40 GMT
Hi Shai

I thought that might be what acceptDocs was for, but in my case it is null and throws a NPE
if I try your suggestion.

What am I doing wrong? I'd like to really understand this stuff ..

Thanks 

Chris


> On 11 Mar 2015, at 13:05, Shai Erera <serera@gmail.com> wrote:
> 
> I don't see that you use acceptDocs in your MyNDVFilter. I think it would
> return false for all userB docs, but you should confirm that.
> 
> Anyway, because you use an NDV field, you can't automatically skip
> unrelated documents, but rather your code would look something like:
> 
> for (int i = 0; i < reader.maxDoc(); i++) {
>  if (!acceptDocs.get(i)) {
>    continue;
>  }
>  // document is accepted, read values
>  ...
> }
> 
> Shai
> 
>> On Wed, Mar 11, 2015 at 1:25 PM, Ian Lea <ian.lea@gmail.com> wrote:
>> 
>> Can you use a BooleanFilter (or ChainedFilter in 4.x) alongside your
>> BooleanQuery?   Seems more logical and I suspect would solve the problem.
>> Caching filters can be good too, depending on how often your data changes.
>> See CachingWrapperFilter.
>> 
>> --
>> Ian.
>> 
>> 
>> On Tue, Mar 10, 2015 at 12:45 PM, Chris Bamford <cbamford@mimecast.com>
>> wrote:
>> 
>>> 
>>> Hi,
>>> 
>>> I have an index of 30 docs, 20 of which have an owner field of "UserA"
>>> and 10 of "UserB".
>>> I also have a query which consists of:
>>> 
>>> BooleanQuery:
>>> -- Clause 1: TermQuery
>>> -- Clause 2: FilteredQuery
>>> ----- Branch 1: MatchAllDocsQuery()
>>> ----- Branch 2: MyNDVFilter
>>> 
>>> I execute my search as follows:
>>> 
>>> searcher.search( booleanQuery,
>>>                                    new TermFilter(new Term("owner",
>>> "UserA"),
>>>                                    50);
>>> 
>>> The TermFilter's job is to reduce the number of searchable documents
>>> from 30 to 20, which it does for all clauses of the BooleanQuery except
>> for
>>> MyNDVFilter which iterates through the full 30 docs, 10 needlessly.  How
>>> can I restrict it so it behaves the same as the other query branches?
>>> 
>>> MyNDVFilter source code:
>>> 
>>> public class MyNDVFilter extends Filter {
>>> 
>>>     private String fieldName;
>>>    private String matchTag;
>>> 
>>>     public TagFilter(String ndvFieldName, String matchTag) {
>>>        this.fieldName = ndvFieldName;
>>>        this.matchTag = matchTag;
>>>    }
>>> 
>>>     @Override
>>>    public DocIdSet getDocIdSet(AtomicReaderContext context, Bits
>>> acceptDocs) throws IOException {
>>> 
>>>         AtomicReader reader = context.reader();
>>>        int maxDoc = reader.maxDoc();
>>>        final FixedBitSet bitSet = new FixedBitSet(maxDoc);
>>>        BinaryDocValues ndv = reader.getBinaryDocValues(fieldName);
>>> 
>>>         if (ndv != null) {
>>>            for (int i = 0; i < maxDoc; i++) {
>>>                BytesRef br = ndv.get(i);
>>>                if (br.length > 0) {
>>>                    String strval = br.utf8ToString();
>>>                    if (strval.equals(matchTag)) {
>>>                        bitSet.set(i);
>>>                        System.out.println("MyNDVFilter >> " + matchTag
+
>>> " matched " + i + " [" + strval + "]");
>>>                    }
>>>                }
>>>            }
>>>        }
>>> 
>>>         return new DVDocSetId(bitSet);    // just wraps a FixedBitSet
>>>    }
>>> }
>>> 
>>> 
>>> 
>>>  Chris Bamford m: +44 7860 405292  w: www.mimecast.com  Senior
>> Developer p:
>>> +44 207 847 8700 Address click here
>>> <http://www.mimecast.com/About-us/Contact-us/>
>>> ------------------------------
>>> [image: http://www.mimecast.com]
>>> <
>> https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=83be674748892bc34425eb4133af3e68
>>> 
>>>  [image: LinkedIn]
>>> <
>> https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=83a78f78bdfa40c471501ae0b813a68f>
>> [image:
>>> YouTube]
>>> <
>> https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=ad1ed1af5bb9cf9dc965267ed43faff0>
>> [image:
>>> Facebook]
>>> <
>> https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=172d4ea57e4a4673452098ba62badace>
>> [image:
>>> Blog]
>>> <
>> https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=871b30b627b3263b9ae2a8f37b0de5ff>
>> [image:
>>> Twitter]
>>> <
>> https://serviceA.mimecast.com/mimecast/click?account=C1A1&code=cc3a825e202ee26a108f3ef8a1dc3c6f
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message