incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject [lucy-dev] When to perform query optimization
Date Mon, 11 Apr 2011 18:26:02 GMT
On Sun, Apr 10, 2011 at 12:08:05PM -0700, Nathan Kurz wrote:
> I'm going to try to chip off some small pieces and deal with them
> individually.   As a result, I may have a number of threads going at
> once.  

Sounds good!

I'm going to do the same, changing the subject liberally so that the web 
archive of our conversations will be as easy to scan and to search as
possible.

> Equally, if they want to start with a QueryParser generated Query and
> adjust it, for example by adding an optimization pass, they can do so.

It's theoretically possible to perform a certain amount of optimization at the
abstract Query stage.  For instance, you can make the following simplification:

    foo AND (((bar)))    -->    foo AND bar 

However, you don't yet have all the information -- you don't have either
corpus or segment statistics -- so your optimization pass will not be as
thorough as it could be. 

Lucy currently does its optimization on a per-segment basis, when compiling
the Matcher.  Say that you're searching for the following phrase:

    "macarthur park"

If a given segment doesn't contain the term 'macarthur', you know that the
phrase can never match -- so Matcher compilation can fail fast and you can
proceed immediately to the next segment.  

You can't exploit that sort of optimization when examining a Query in the 
abstract, before it is associated with a corpus.  You can't fully exploit that
sort of optimization at the weighting stage within the top-level Searcher,
either, because some segments might contain 'macarthur' while others won't.

Maximum query optimization is only possible at the segment level.

Marvin Humphrey


Mime
View raw message