lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Kurz <>
Subject Re: [lucy-dev] Who should perform query optimization?
Date Tue, 12 Apr 2011 05:41:50 GMT
On Mon, Apr 11, 2011 at 8:23 PM, David E. Wheeler <> wrote:
> On Apr 11, 2011, at 4:32 PM, Marvin Humphrey wrote:
>> On Sun, Apr 10, 2011 at 12:08:05PM -0700, Nathan Kurz wrote:
>>> Query optimization is a great thing, but it should not happen behind the
>>> scenes.
>> That's a really interesting perspective.
> One I disagree with, personally.

Please do disagree, ideally clearly and in great detail.  :)

I realize that a number of things I'm suggesting are impossible, or at
least "impossible".  But some of them really are roadblocks for
problems I'd like to someday solve, ideally without having to write a
search engine from scratch.  My goal is to open up the architecture of
Lucy so that it works for my needs, although I realize that these
needs are not universal.

>> We would expect something like psql, the command-line interface to PostgreSQL,
>> to perform implicit query optimization "behind the scenes" when an end-user
>> supplies a query as SQL text.  We would likewise expect a search engine app
>> based on Lucy to perform implicit query optimization when an end-user supplies
>> a text query string.
> psql doesn't do that. The server back end does it. The front-end just passes the queries
to the back end. From KinoSearch's perspective, you should pretend there is no psql, just
DBD::Pg (or libpq).

This might be a matter of how one views Lucy.   Postgres is a server,
and libpq is (I think) just a thin client to that passes queries to
that server.  By contrast,  I see Lucy as a toolkit for developing
search applications.   Rather than being a black box (query in,
results out), I want to use Lucy to develop other semi-opaque boxes.
I want it to provide a clear framework for adding custom layers, and
this is only possible if the layers are well defined.

>> So what we're talking about here instead is Lucy's programmatic, OO interface.
>> Several Searcher methods accept a Query object as an argument.  Should
>> Searcher perform query optimization internally, or should it assume that the
>> Query has been fully optimized already?
>> Put another way: Should query optimization be the domain of the application,
>> or the library?

I'm actually not actually arguing that it should be up to the
application, rather I just want it to happen "out in the open" to the
extent that is possible.  The "black box" approach would go something

my $searcher = new Searcher("index");
my $results = $searcher->Search("text query");

I'm OK with that, so long as those are considered convenience methods
rather than the real API.  What I want (allegory rather than explicit)
is for Searcher::Search to internally have some clearly defined
layers, something like:

my $query = new Query("text query");
my $optimized = Query::Optimize($query):

Now, for some of the reasons that Marvin points out, there are certain
things that this approach just can't do without knowledge of the
actual index.   But I think there are a lot of things that it can do,
and simply making this layer explicit will make it easier to swap in a
different approach.  If nothing else, it makes for easy testing of the
Optimizer, as one can easily run the same query with and without
Optimization and see if the results change.

> Nice that it's there, but it should be damn near impossible for a user to optimize a
query > better than the core does, IMHO.

It's not actually that I want to optimize things better than the core,
else I'd  just try to fix the core.  Instead, I want a flexible core.
I want to make sure there is a way to stick in my own layers (and to
replace the layers that are there) without having to rewrite every
class all the way down.   I think this would be easier if optimization
was restricted to the Query creation phase, and the "engine" just ran
the query it was given.


View raw message