lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Performance of IndexSearcher.explain(Query)
Date Tue, 20 Nov 2012 23:40:15 GMT
On Tue, Nov 20, 2012 at 6:18 PM, Trejkaz <trejkaz@trypticon.org> wrote:

> I have a feature I wanted to implement which required a quick way to
> check whether an individual document matched a query or not.
>
> IndexSearcher.explain seemed to be a good fit for this.
>
> The query I tested was just a BooleanQuery with two TermQuery inside
> it, both with MUST. I ran an empty query to match all documents and
> then ran the new code against each document. Within 40,743 documents,
> 1,072 documents matched the query.
>
> I got the times of around 15.5s doing this. After noticing that
> ConstantScoreQuery now works with Query in addition to Filter, I
> started using it as well, which further reduced this time to 13.6s.
>
> There is a comment like this on the explain method, though:
>
>     "Computing an explanation is as expensive as executing
>      the query over the entire index."
>
> So I wanted to test this. To do this, I made a collector which did
> nothing but look for the single item being matched.
>
> Times for searching the whole index using this collector came to
> around 30.9s, which is more than twice as slow as using explain (times
> didn't vary at all if I used ConstantScoreQuery here, which I assume
> is something to do with using a custom collector which is ignoring the
> scorer.)
>
> So I was wondering, is this comment just out of date? It seems that by
> using explain(), I get the same information I get by querying the
> whole index, *plus* information about the score which the custom
> collector wasn't recording, all in less than half the time it took to
> query the whole index.
>
>
Explain is not performant... but the comment is fair I think? Its more of a
worst-case, depends on the query.
Explain is going to rewrite the query/create the weight and so on just to
advance() the scorer to that single doc
So if this is e.g. a wildcard query then it could definitely be almost as
slow as searching the whole index since the rewrite involves scanning
through the term dictionary or whatever.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message