lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Devins <j...@amenhq.com>
Subject Re: Filter and query precedence, boolean query
Date Sun, 23 Oct 2011 20:37:54 GMT
I'll reply to the thread with your comment from our IM chat in case it helps
anyone else thinking about this.

In response to what is preferred, boolean query w/ term queries or a term
filter+term query and if order in the boolean query somehow matters:

we take care of this internaly
no matter which order
if you don't need to score on one or another use a filter


My general sense is that since the resulting DocIdSet from the filter is
cacheable, consider this when building the query. That is, perhaps make that
part (the filter) of the overall query as reusable as possible to maximize
cache hits.

Will peek at the source once I have some time ;)

Thanks,

Josh



On 23 October 2011 22:08, Simon Willnauer <simon.willnauer@googlemail.com>wrote:

> hey josh,
>
> On Sun, Oct 23, 2011 at 5:39 PM, Josh Devins <josh@amenhq.com> wrote:
> > Hi folks,
> >
> > I'm hoping someone can shed some light on how filters and boolean queries
> > work under the hood. As I understand it, the following two queries are
> > functionally equivalent:
> >
> > boolean must, term query: foo, boolean must, term query: bar
> > term query: foo, term filter: bar
>
> their result set is the same while if you score you might get different
> scores.
>
> >
> > What I'd like to understand is:
> >
> > 1) How are boolean queries run by Lucene? Are both queries (term query:
> foo,
> > term query: bar) run and then set operation intersection performed to
> find
> > the final document set? Or is it a staged query where term query: foo
> runs
> > first, then term query: bar run on the subset returned from the first
> query
> > for foo?
>
> Lucene does document a time retrieval so both TermQueries are
> evaluated at the same time. The BooleanScorer will advance both
> TermQueries until it finds a document containing both terms etc.
> >
> > 2) When running the above query+filter, which is run first? Specifically,
> if
> > documents with the term 'foo' are an order of magnitude larger than the
> > documents with the term 'bar', should they be swapped in the above query
> so
> > that the results of the query are as small as possible before running the
> > filter. Or does the query run against the results of the filter?
>
> on the lucene level if you specify a filter the filters DocIdSet is
> pulled before the query is executed. However, it depends on the impl.
> if the set is build ahead of time or during evaluation. Once the
> filter is created we use a leapfrog approach meaning that initially we
> advance both the filter and the query to their first doc, if the docs
> match we score the doc, if the filters doc is greater than the queries
> doc the query is advance to the next doc greater or equal to the last
> filtered doc otherwise the advance is swapped. if you use something
> like QueryWrapperFilter (a filter created from a query) the query to
> build the filter runs first. This applies to Lucene 3.x in 4.0 we are
> currently changing how fitlers work though.
>
>
> >
> > Hopefully this make sense :)
>
> same here :)
>
> simon
> >
> > Thanks,
> >
> > Josh
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message