lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject [lucy-dev] Who should perform query optimization?
Date Mon, 11 Apr 2011 23:32:57 GMT
On Sun, Apr 10, 2011 at 12:08:05PM -0700, Nathan Kurz wrote:
> Query optimization is a great thing, but it should not happen behind the
> scenes.

That's a really interesting perspective.  

We would expect something like psql, the command-line interface to PostgreSQL,
to perform implicit query optimization "behind the scenes" when an end-user
supplies a query as SQL text.  We would likewise expect a search engine app
based on Lucy to perform implicit query optimization when an end-user supplies
a text query string.

So what we're talking about here instead is Lucy's programmatic, OO interface.
Several Searcher methods accept a Query object as an argument.  Should
Searcher perform query optimization internally, or should it assume that the
Query has been fully optimized already?

Put another way: Should query optimization be the domain of the application,
or the library?

In my opinion, we have to allow Searcher to perform query optimization
internally.  I've argued elsewhere that for maximum efficiency, optimization
needs to happen at the segment level, during the stage where we compile a
Matcher.  If we were to establish a rule that all optimization had to happen
outside of Searcher, we wouldn't be able to exploit all the per-segment data
that's available to us and we'd leave potential optimizations on the table.

So then, if we're going to continue performing optimization internally, how do
we make it happen out in the open when the user so desires, rather than
"behind the scenes"?

The answer is that the user can already make that happen if they want.
Compiler subclasses Query, so any Searcher method that takes a Query can take
a Compiler instead.  

    my $compiler = $query->make_compiler(searcher => $searcher);
    my $hits = $searcher->hits(query => $compiler);

And it's the Compiler object that actually controls the
optimization process.  Searcher's involvement is limited to invoking
Compiler_Make_Matcher():

    Matcher *matcher = 
        Compiler_Make_Matcher(compiler, seg_reader, need_score);

To disable optimization, or enact custom optimizations, the user needs to
implement custom Compiler classes.  (For now, that means they must also
implement custom Query classes, since Queries serve as factories for
Compilers.)

Honestly, I doubt that many users will seek to exert manual control over query
optimization -- but if they want to, they can.

Marvin Humphrey


Mime
View raw message