lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject [lucy-dev] When to perform query optimization
Date Mon, 11 Apr 2011 18:26:02 GMT
On Sun, Apr 10, 2011 at 12:08:05PM -0700, Nathan Kurz wrote:
> I'm going to try to chip off some small pieces and deal with them
> individually.   As a result, I may have a number of threads going at
> once.  

Sounds good!

I'm going to do the same, changing the subject liberally so that the web 
archive of our conversations will be as easy to scan and to search as

> Equally, if they want to start with a QueryParser generated Query and
> adjust it, for example by adding an optimization pass, they can do so.

It's theoretically possible to perform a certain amount of optimization at the
abstract Query stage.  For instance, you can make the following simplification:

    foo AND (((bar)))    -->    foo AND bar 

However, you don't yet have all the information -- you don't have either
corpus or segment statistics -- so your optimization pass will not be as
thorough as it could be. 

Lucy currently does its optimization on a per-segment basis, when compiling
the Matcher.  Say that you're searching for the following phrase:

    "macarthur park"

If a given segment doesn't contain the term 'macarthur', you know that the
phrase can never match -- so Matcher compilation can fail fast and you can
proceed immediately to the next segment.  

You can't exploit that sort of optimization when examining a Query in the 
abstract, before it is associated with a corpus.  You can't fully exploit that
sort of optimization at the weighting stage within the top-level Searcher,
either, because some segments might contain 'macarthur' while others won't.

Maximum query optimization is only possible at the segment level.

Marvin Humphrey

View raw message