lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Pre-filtering for expensive query
Date Sat, 30 Aug 2008 20:06:26 GMT
Op Saturday 30 August 2008 18:22:50 schreef Matt Ronge:
> On Aug 30, 2008, at 6:13 AM, Paul Elschot wrote:
> > Op Saturday 30 August 2008 03:34:01 schreef Matt Ronge:
> >> Hi all,
> >>
> >> I am working on implementing a new Query, Weight and Scorer that
> >> is expensive to run. I'd like to limit the number of documents I
> >> run this query on by first building a candidate set of documents
> >> with a boolean query. Once I have that candidate set, I was hoping
> >> I could build a filter off of it, and issue that along with my
> >> expensive query. However, after reading the code I see that
> >> filtering is done during the search, and not before hand.
> >
> > Correct. I suppose you mean the filtering code in IndexSearcher?
>
> Yes, that's exactly what I mean.
>
> >> So my initial boolean query
> >> won't help in limiting the number of documents scored by my
> >> expensive query.
> >
> > The trick of filtering is the use of skipTo() on both the filter
> > and the scorer to skip superfluous work as much as possible.
> > So when you make your scorer implement skipTo() efficiently,
> > filtering it should reduce the amount of scoring done.
> >
> > Implementing skipTo() efficiently is normally done by using
> > TermScorer.skipTo() on the leafs of a scorer structure. So,
> > in case you implement your own TermScorer, take a serious
> > look at TermScorer.skipTo().
> >
> > Normally, score value computations are not the bottleneck,
> > but accessing the index is, and this is where skipTo() does
> > the real work. At the moment avoiding score value computations
> > is a nice extra.
>
> I was not aware of this. Where can I find the code that uses the
> filter to determine what values to feed to skipTo (I'm trying to get
> a better understand of the Lucene source)?

It's the same code in IndexSearcher.
ConjunctionScorer.skipTo() does the much the same thing for
any number of scorers.

>
> >> Or should I just implement something myself in a custom scorer?
> >
> > In case you have a better way than skipTo(), or something
> > to improve on this issue to allow a Filter as clause to
> > BooleanQuery: https://issues.apache.org/jira/browse/LUCENE-1345
> > let us know.
>
> Thanks, if the skipTo approach doesn't work, I'll take a look at
> this.

For the moment, Andrzej's suggestion to use FilteredQuery as a clause 
could well be good enough.
Btw. FilteredQuery also contains a filtering scorer under the hood,
you could take a look there, too.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message