incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: Class hierarchy
Date Fri, 11 Sep 2009 22:59:52 GMT
On Fri, Sep 11, 2009 at 12:50:14PM -0600, Nathan Kurz wrote:
> On Fri, Sep 11, 2009 at 11:27 AM, Marvin Humphrey
> <> wrote:
> > Huffman coding naming principles dictate that classes whose names are typed
> > most often should have the shortest names.  Therefore, instead of locating
> > common classes within sub-trees, we should locate them at the first level --
> > directly underneath Lucy.
> I'm only heckling from the sidelines lately, but this produced an
> internal 'ug'.  

Heh.  I had a hunch that if any email to this list would generate responses,
it would be this one.  :\

> > Schema, Doc, QueryParser, and probably Indexer will all descend from Lucy::Obj.
> Might it be possible to rename Lucy::Obj to Lucy, so that everything
> is Lucy:: is a Lucy object?    

That would degrade the clarity of the C code.  Variants of "Lucy" are already
used as the first level namespace differentiator.  Adding a "Lucy" type will
double up on our use of "lucy":

  lucy_Lucy *dupe = Lucy_Lucy_Clone(thing);

It's the "buffalo buffalo" problem, no?  :)

> It's boring, but I really like 'Class::Subclass::SubSubClass' schemes.

There will be too many first-level descendents.  Right now KinoSearch has
around 60 classes which extend Obj, not including test classes.  If we dump
everything into Lucy/, we'll get a big mess.  There's no choice but to break
stuff up into multiple directories by general topic.

And yet, we have a constraint imposed by our C naming scheme.  In order to
avoid horrendously long symbols, we only use one level of namespacing
beyond the "lucy" prefix:

  lucy_HitCollector *collector = (lucy_HitCollector*)lucy_BitColl_new(bit_vec);

Consider the alternative:

  lucy_HitCollector_BitCollector *collector 
    = (lucy_HitCollector*)lucy_HitColl_BitColl_new(bit_vec);

(Or something like that.)

For this reason, the final component of the class name has to convey the
identity of the class without any other context.  Lucy::Search::Query::Term
would be ok for a pure Perl hierarcy, but it won't work for Lucy -- that class
has to end in "TermQuery".

And if we accept that all search-related components are going to start with
Lucy::Search, then an inheritance-driven subclass naming scheme starts to
yield painfully long fully-qualified class names.
"Lucy::Search::HitCollector::BitCollector" is 40 characters; a lot of people
limit their code to 78-80 characters per line, and class names that long start
to cause awkward wrappings.  We don't want to have too many of those.

I think the primary principle guiding our class hierarchy organization has to
be grouping by topic, as in Lucene.  A 'Class::Subclass::SubSubClass' scheme
just isn't workable.

To be fully consistent with Lucene, though, we'd have to put QueryParser under
Lucy::QueryParser::QueryParser, like Plucene and early versions of KinoSearch
did.  That always bugged me, which is why it moved in later versions of

But QueryParser could also go under Lucy::Search.  Maybe we should try to have
all second-level namespacing represent grouping only?  In other words, there
would be no instantiable classes with the pattern Lucy::Xxxx -- only
Lucy::Xxxx::Xxxx and deeper.

That would change my initial proposal to this:


It would also imply moving around some other classes I didn't mention in my
original proposal for brevity's sake:


If we arrange things this way, at least no subclass is ever located above its
superclass in the hierarchy -- as was the case with Lucy::Searcher subclassing
Lucy::Search::Searchable.  They're always at the same level or below.



Additionally, we remove the ambiguity about what the second part of the class
name means -- it's always a grouping.  Think of Lucy::Search as LucySearch and
Lucy::Index as LucyIndex, if you like.

Marvin Humphrey

View raw message