lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Pre-filtering for expensive query
Date Sat, 30 Aug 2008 11:13:20 GMT
Op Saturday 30 August 2008 03:34:01 schreef Matt Ronge:
> Hi all,
> I am working on implementing a new Query, Weight and Scorer that is
> expensive to run. I'd like to limit the number of documents I run
> this query on by first building a candidate set of documents with a
> boolean query. Once I have that candidate set, I was hoping I could
> build a filter off of it, and issue that along with my expensive
> query. However, after reading the code I see that filtering is done
> during the search, and not before hand.

Correct. I suppose you mean the filtering code in IndexSearcher?

> So my initial boolean query 
> won't help in limiting the number of documents scored by my expensive
> query.

The trick of filtering is the use of skipTo() on both the filter and
the scorer to skip superfluous work as much as possible.
So when you make your scorer implement skipTo() efficiently,
filtering it should reduce the amount of scoring done.

Implementing skipTo() efficiently is normally done by using
TermScorer.skipTo() on the leafs of a scorer structure. So,
in case you implement your own TermScorer, take a serious
look at TermScorer.skipTo().

Normally, score value computations are not the bottleneck,
but accessing the index is, and this is where skipTo() does
the real work. At the moment avoiding score value computations
is a nice extra.

>   Has anyone done any work into restricting the set of docs that a
> query operates on?

Yes, Filters.

> Or should I just implement something myself in a custom scorer?

In case you have a better way than skipTo(), or something
to improve on this issue to allow a Filter as clause to BooleanQuery:
let us know.

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message