lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Ronge <>
Subject Re: Pre-filtering for expensive query
Date Sat, 30 Aug 2008 16:22:50 GMT

On Aug 30, 2008, at 6:13 AM, Paul Elschot wrote:

> Op Saturday 30 August 2008 03:34:01 schreef Matt Ronge:
>> Hi all,
>> I am working on implementing a new Query, Weight and Scorer that is
>> expensive to run. I'd like to limit the number of documents I run
>> this query on by first building a candidate set of documents with a
>> boolean query. Once I have that candidate set, I was hoping I could
>> build a filter off of it, and issue that along with my expensive
>> query. However, after reading the code I see that filtering is done
>> during the search, and not before hand.
> Correct. I suppose you mean the filtering code in IndexSearcher?

Yes, that's exactly what I mean.

>> So my initial boolean query
>> won't help in limiting the number of documents scored by my expensive
>> query.
> The trick of filtering is the use of skipTo() on both the filter and
> the scorer to skip superfluous work as much as possible.
> So when you make your scorer implement skipTo() efficiently,
> filtering it should reduce the amount of scoring done.
> Implementing skipTo() efficiently is normally done by using
> TermScorer.skipTo() on the leafs of a scorer structure. So,
> in case you implement your own TermScorer, take a serious
> look at TermScorer.skipTo().
> Normally, score value computations are not the bottleneck,
> but accessing the index is, and this is where skipTo() does
> the real work. At the moment avoiding score value computations
> is a nice extra.

I was not aware of this. Where can I find the code that uses the  
filter to determine what values to feed to skipTo (I'm trying to get a  
better understand of the Lucene source)?

>> Or should I just implement something myself in a custom scorer?
> In case you have a better way than skipTo(), or something
> to improve on this issue to allow a Filter as clause to BooleanQuery:
> let us know.

Thanks, if the skipTo approach doesn't work, I'll take a look at this.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message