Return-Path: Delivered-To: apmail-incubator-lucy-dev-archive@www.apache.org Received: (qmail 68169 invoked from network); 2 Apr 2011 18:06:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Apr 2011 18:06:50 -0000 Received: (qmail 6926 invoked by uid 500); 2 Apr 2011 18:06:50 -0000 Delivered-To: apmail-incubator-lucy-dev-archive@incubator.apache.org Received: (qmail 6885 invoked by uid 500); 2 Apr 2011 18:06:50 -0000 Mailing-List: contact lucy-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@incubator.apache.org Delivered-To: mailing list lucy-dev@incubator.apache.org Received: (qmail 6877 invoked by uid 99); 2 Apr 2011 18:06:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Apr 2011 18:06:50 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [68.116.39.62] (HELO rectangular.com) (68.116.39.62) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Apr 2011 18:06:43 +0000 Received: from marvin by rectangular.com with local (Exim 4.69) (envelope-from ) id 1Q65AE-0003Q9-Mp; Sat, 02 Apr 2011 11:03:10 -0700 Date: Sat, 2 Apr 2011 11:03:10 -0700 From: Marvin Humphrey To: lucy-dev@incubator.apache.org, peter@peknet.com Message-ID: <20110402180310.GA13116@rectangular.com> References: <20110401004129.GA14002@rectangular.com> <4D9715C9.206@peknet.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4D9715C9.206@peknet.com> User-Agent: Mutt/1.5.18 (2008-05-17) Subject: Re: [lucy-dev] Three class name changes On Sat, Apr 02, 2011 at 07:25:45AM -0500, Peter Karman wrote: > > > > The second change is to rename Lucy::Search::Compiler to > > Lucy::Search::Investigation, "A Query applied to a specific collection of > > documents." With this change, each Query-Compiler-Matcher trio will become a > > Query-Investigation-Matcher trio instead. I know that Nate would prefer to > > eliminate the middle stage entirely, creating Query-Matcher pairs. The name > > change to Investigation is not meant to prejudice the decision to zap or not > > to zap, which is too involved to tackle prior to the 0.1.0-incubating release. > > > > Investigation seems a little awkward as a name. True. "Investigation" has too many syllables, and it will lead to long symbol names at the C level: lucy_NoMatchInvestigation *investigation = lucy_NoMatchInvest_new(); Other names along the same theme that I considered were "Probe" and "Inquiry". The downside of "Probe" is that it's already claimed by Charmonizer. (Sample usage in email conversation: "It looks like we need a Charmonizer Probe for S_IFDIR.") IMO, it's undesirable to overload "Probe" with one meaning within the Lucy core and another within Charmonizer. "Inquiry" might be a possibility, though. I originally discarded it because I thought it sounded a little too close to "Query", but it's not awkward like "Investigation". What do you think? > The docs for the Compiler class say: > > "The purpose of the Compiler class is to take a specification in the form of a > Query object and compile a Matcher object that can do real work." Yes, that's the role of Compiler that we currently emphasize. It's possible to see it from other perspectives, though. Another way to think of this class is as the container which holds state for weighting information generated when a Query is applied against a corpus. That's the role that Lucene chooses to emphasize -- in Lucene, the analogue to this class is called "Weight". I think the name "Weight" is quite unfortunate, though. It's hard to see a variable named "weight" as anything other than a scalar numeric quantity, which makes for code which doesn't read very well and email discussions which are hard to follow. "WeightedQuery" would be more accurate; the Lucene folks contemplated "QueryWeight" for a while, as well. I dislike all of those. The Lucy class is slightly different from the Lucene class, too. In Lucy, these objects are subclasses of Query, but in Lucene, weights are not queries. Additionally, in Lucy we've given these objects a very important and active role in highlighting. The names "Investigation", "Probe", and "Inquiry" give this class a different identity than "Compiler" -- they are all intended to convey the impression of "A Query that has gotten serious" instead of "a factory for Matchers". Some documentation and Cookbook material will need to be reworked subtly to adapt to the new identity. I don't think the change in emphasis is substantially better or worse than what we have now, but I do think that the current class *name* has significant deficiencies -- and re-envisioning the class's role opens up our naming options. > What's wrong with Compiler? "Compiler" has two problems. The first is that the word "compiler" is already loaded with meaning. This was less of an issue 3 years ago when we refactored Weight and renamed it "Compiler" because KinoSearch was still primarily talked about from a Perl perspective -- and you don't talk about compilers very often in a Perl context. These days, though, we talk about compilers all the time -- C compilers, the Clownfish compiler, etc. -- and that's only going to intensify. Now, when I say "It's the Compiler's job to create raw highlighting data", that sounds strange. If you're not familiar with the Compiler class, you're going to think I meant the C compiler -- and what on earth could the C compiler have to do with highlighting? The second problem with "Compiler" is that it produces poor subclass names. A "TermCompiler" doesn't compile terms, and a "PhraseCompiler" doesn't compile phrases. Names like "TermInquiry", "PhraseInquiry", "TermInvestigation", and "PhraseInvestigation" don't have that problem. > It compiles a Matcher. MatchMaker? Investigation is a kind of a passive > noun. Investigator? "Investigator" is kind of neat. However, it has the same subclass naming issue as "Compiler": a "PhraseInvestigator" doesn't investigate phrases. "MatchMaker" is a little strange because these objects are factories for Matchers, not matches. We're also using the word "match" an awful lot these days and I'm reluctant to keep piling on. It's cool that "MatchMaker" still emphasizes the factory role, though. > If this sounds like bike-shedding forgive me. I guess I just don't see the > problem with Compiler. Class naming is a difficult and important task, central to OO interface design -- good class names make understanding a library's structure much easier and using it more intuitive. It's impossible to get class naming right without user feedback, though, just like it's impossible to get web page interface design right without user testing. I'm glad you spoke up. Marvin Humphrey