lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: [lucy-dev] Three class name changes
Date Sat, 02 Apr 2011 18:03:10 GMT
On Sat, Apr 02, 2011 at 07:25:45AM -0500, Peter Karman wrote:
> > 
> > The second change is to rename Lucy::Search::Compiler to
> > Lucy::Search::Investigation, "A Query applied to a specific collection of
> > documents."  With this change, each Query-Compiler-Matcher trio will become a
> > Query-Investigation-Matcher trio instead.  I know that Nate would prefer to
> > eliminate the middle stage entirely, creating Query-Matcher pairs.  The name
> > change to Investigation is not meant to prejudice the decision to zap or not
> > to zap, which is too involved to tackle prior to the 0.1.0-incubating release.
> > 
> Investigation seems a little awkward as a name. 


"Investigation" has too many syllables, and it will lead to long symbol names
at the C level:

    lucy_NoMatchInvestigation *investigation 
        = lucy_NoMatchInvest_new();

Other names along the same theme that I considered were "Probe" and "Inquiry".

The downside of "Probe" is that it's already claimed by Charmonizer.  (Sample
usage in email conversation: "It looks like we need a Charmonizer Probe for
S_IFDIR.")  IMO, it's undesirable to overload "Probe" with one meaning within
the Lucy core and another within Charmonizer.

"Inquiry" might be a possibility, though.  I originally discarded it because I
thought it sounded a little too close to "Query", but it's not awkward like
"Investigation".  What do you think?

> The docs for the Compiler class say:
>  "The purpose of the Compiler class is to take a specification in the form of a
> Query object and compile a Matcher object that can do real work."

Yes, that's the role of Compiler that we currently emphasize.  It's possible
to see it from other perspectives, though.

Another way to think of this class is as the container which holds state for
weighting information generated when a Query is applied against a corpus.
That's the role that Lucene chooses to emphasize -- in Lucene, the analogue to
this class is called "Weight".

I think the name "Weight" is quite unfortunate, though.  It's hard to see a
variable named "weight" as anything other than a scalar numeric quantity,
which makes for code which doesn't read very well and email discussions which
are hard to follow.  "WeightedQuery" would be more accurate; the Lucene folks
contemplated "QueryWeight" for a while, as well.  I dislike all of those.

The Lucy class is slightly different from the Lucene class, too.  In Lucy,
these objects are subclasses of Query, but in Lucene, weights are not queries.
Additionally, in Lucy we've given these objects a very important and active
role in highlighting.

The names "Investigation", "Probe", and "Inquiry" give this class a different
identity than "Compiler" -- they are all intended to convey the impression of
"A Query that has gotten serious" instead of "a factory for Matchers".  Some
documentation and Cookbook material will need to be reworked subtly to adapt
to the new identity.  I don't think the change in emphasis is substantially
better or worse than what we have now, but I do think that the current class
*name* has significant deficiencies -- and re-envisioning the class's role
opens up our naming options.
> What's wrong with Compiler?

"Compiler" has two problems.

The first is that the word "compiler" is already loaded with meaning.  This
was less of an issue 3 years ago when we refactored Weight and renamed it
"Compiler" because KinoSearch was still primarily talked about from a Perl
perspective -- and you don't talk about compilers very often in a Perl

These days, though, we talk about compilers all the time -- C compilers, the
Clownfish compiler, etc. -- and that's only going to intensify.  Now, when I
say "It's the Compiler's job to create raw highlighting data", that sounds
strange.  If you're not familiar with the Compiler class, you're going to
think I meant the C compiler -- and what on earth could the C compiler have to
do with highlighting?

The second problem with "Compiler" is that it produces poor subclass names.  A
"TermCompiler" doesn't compile terms, and a "PhraseCompiler" doesn't compile
phrases.  Names like "TermInquiry", "PhraseInquiry", "TermInvestigation", and
"PhraseInvestigation" don't have that problem.

> It compiles a Matcher. MatchMaker? Investigation is a kind of a passive
> noun. Investigator?

"Investigator" is kind of neat.  However, it has the same subclass naming issue
as "Compiler": a "PhraseInvestigator" doesn't investigate phrases.

"MatchMaker" is a little strange because these objects are factories for
Matchers, not matches.  We're also using the word "match" an awful lot these
days and I'm reluctant to keep piling on.  It's cool that "MatchMaker" still
emphasizes the factory role, though.
> If this sounds like bike-shedding forgive me. I guess I just don't see the
> problem with Compiler.

Class naming is a difficult and important task, central to OO interface 
design -- good class names make understanding a library's structure much
easier and using it more intuitive.  It's impossible to get class naming right
without user feedback, though, just like it's impossible to get web page
interface design right without user testing.  I'm glad you spoke up.

Marvin Humphrey

View raw message