lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Jaen <>
Subject Re: Boolean Query: Knowing Which Clauses Matched
Date Wed, 18 Jul 2012 14:22:34 GMT
Will be great if someone can show how to do it..
For my application, I donot care about any score (just vanilla boolean
search is sufficient)

In the mean while, I experimented with some workaround and would like to
share the findings:

Problem details:
On a collection on 10 million documents, I want to run boolean queries.
These boolean queries act as document classifiers for us and there are a
few 1500 such queries (each having about 300 boolean clauses). If a
document matches the query, we want to know which parts of the boolean
queries match the doc (this is a BI application which does text analytics
and we need the counts for each matched boolean clause for statistics

As a workaround, I create a filter using the original boolean query, cache
it, and fire each boolean sub-query subsequently. This has given me a lot
of performance gain (these are initial observations, am still evaluating
the performance)

Some pseudo-code
Filter filter = new QueryWrapperFilter(bigBooleanQuery);
CachingWrapperFilter cachingFilter;
cachingFilter = new CachingWrapperFilter(filter);

fire each boolean subQuery with filter...

On Wed, Jul 18, 2012 at 9:25 PM, Michael McCandless <> wrote:

> This is possible, using the ScorerVisitor (3.6) / getChildren (4.0).
> You need a custom collector that when it collects a competitive hit,
> visits the sub-scorers of your BooleanQuery and saves away which ones
> matched the current doc.
> But this is very expert and there are real challenges (eg not all
> scorers score document-at-a-time) ... would be nice if someone wrote
> up some example code showing how to do it...
> Mike McCandless
> On Wed, Jul 18, 2012 at 7:17 AM, Ashish Jaen <> wrote:
> > Is there a way to know which sub-clause of a boolean query matched in the
> > result document ? Currently I am using searcher.explain() on each of the
> > sub-clause of the boolean query (on each of the documents returned by
> > searcher). However, this is turning out to be very slow as I need to
> > process ALL the documents returned by the query (A typical query returns
> > about 20 thousand documents and my collection has 10 million docs. My
> > application is not a user facing one, so few seconds per query is still
> > acceptable)
> >
> > I was wondering if there is a efficient way to achieve the above which
> > doesnot use explain() (perhaps storing the information about which
> > sub-clause matched a document while searching). Can anyone provide some
> > method to solve this and point to the relevant classes which need to be
> > changed.
> >
> > Thanks,
> > -Ashish
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message