Return-Path: Delivered-To: apmail-incubator-lucy-dev-archive@www.apache.org Received: (qmail 55731 invoked from network); 12 Apr 2011 05:42:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Apr 2011 05:42:45 -0000 Received: (qmail 98101 invoked by uid 500); 12 Apr 2011 05:42:44 -0000 Delivered-To: apmail-incubator-lucy-dev-archive@incubator.apache.org Received: (qmail 98023 invoked by uid 500); 12 Apr 2011 05:42:42 -0000 Mailing-List: contact lucy-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@incubator.apache.org Delivered-To: mailing list lucy-dev@incubator.apache.org Received: (qmail 98010 invoked by uid 99); 12 Apr 2011 05:42:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2011 05:42:38 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.213.175] (HELO mail-yx0-f175.google.com) (209.85.213.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2011 05:42:31 +0000 Received: by yxn22 with SMTP id 22so2471668yxn.6 for ; Mon, 11 Apr 2011 22:42:10 -0700 (PDT) Received: by 10.236.75.193 with SMTP id z41mr8010144yhd.91.1302586930201; Mon, 11 Apr 2011 22:42:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.105.240 with HTTP; Mon, 11 Apr 2011 22:41:50 -0700 (PDT) In-Reply-To: <565387A3-3473-4139-8E7D-B9A603499336@kineticode.com> References: <20110401004129.GA14002@rectangular.com> <4D9715C9.206@peknet.com> <20110402180310.GA13116@rectangular.com> <20110403012758.GA13878@rectangular.com> <20110407232933.GB31358@rectangular.com> <20110411233257.GB32065@rectangular.com> <565387A3-3473-4139-8E7D-B9A603499336@kineticode.com> From: Nathan Kurz Date: Mon, 11 Apr 2011 22:41:50 -0700 Message-ID: To: lucy-dev@incubator.apache.org Cc: "David E. Wheeler" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Subject: Re: [lucy-dev] Who should perform query optimization? On Mon, Apr 11, 2011 at 8:23 PM, David E. Wheeler wr= ote: > On Apr 11, 2011, at 4:32 PM, Marvin Humphrey wrote: > >> On Sun, Apr 10, 2011 at 12:08:05PM -0700, Nathan Kurz wrote: >>> Query optimization is a great thing, but it should not happen behind th= e >>> scenes. >> >> That's a really interesting perspective. > > One I disagree with, personally. Please do disagree, ideally clearly and in great detail. :) I realize that a number of things I'm suggesting are impossible, or at least "impossible". But some of them really are roadblocks for problems I'd like to someday solve, ideally without having to write a search engine from scratch. My goal is to open up the architecture of Lucy so that it works for my needs, although I realize that these needs are not universal. >> We would expect something like psql, the command-line interface to Postg= reSQL, >> to perform implicit query optimization "behind the scenes" when an end-u= ser >> supplies a query as SQL text. =C2=A0We would likewise expect a search en= gine app >> based on Lucy to perform implicit query optimization when an end-user su= pplies >> a text query string. > > psql doesn't do that. The server back end does it. The front-end just pas= ses the queries to the back end. From KinoSearch's perspective, you should = pretend there is no psql, just DBD::Pg (or libpq). This might be a matter of how one views Lucy. Postgres is a server, and libpq is (I think) just a thin client to that passes queries to that server. By contrast, I see Lucy as a toolkit for developing search applications. Rather than being a black box (query in, results out), I want to use Lucy to develop other semi-opaque boxes. I want it to provide a clear framework for adding custom layers, and this is only possible if the layers are well defined. >> So what we're talking about here instead is Lucy's programmatic, OO inte= rface. >> Several Searcher methods accept a Query object as an argument. =C2=A0Sho= uld >> Searcher perform query optimization internally, or should it assume that= the >> Query has been fully optimized already? >> >> Put another way: Should query optimization be the domain of the applicat= ion, >> or the library? I'm actually not actually arguing that it should be up to the application, rather I just want it to happen "out in the open" to the extent that is possible. The "black box" approach would go something like: my $searcher =3D new Searcher("index"); my $results =3D $searcher->Search("text query"); I'm OK with that, so long as those are considered convenience methods rather than the real API. What I want (allegory rather than explicit) is for Searcher::Search to internally have some clearly defined layers, something like: my $query =3D new Query("text query"); my $optimized =3D Query::Optimize($query): ... Now, for some of the reasons that Marvin points out, there are certain things that this approach just can't do without knowledge of the actual index. But I think there are a lot of things that it can do, and simply making this layer explicit will make it easier to swap in a different approach. If nothing else, it makes for easy testing of the Optimizer, as one can easily run the same query with and without Optimization and see if the results change. > Nice that it's there, but it should be damn near impossible for a user to= optimize a query > better than the core does, IMHO. It's not actually that I want to optimize things better than the core, else I'd just try to fix the core. Instead, I want a flexible core. I want to make sure there is a way to stick in my own layers (and to replace the layers that are there) without having to rewrite every class all the way down. I think this would be easier if optimization was restricted to the Query creation phase, and the "engine" just ran the query it was given. --nate