incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: Class hierarchy
Date Sat, 12 Sep 2009 21:20:47 GMT
On Fri, Sep 11, 2009 at 09:22:18PM -0700, Nathan Kurz wrote:
> On Fri, Sep 11, 2009 at 3:59 PM, Marvin Humphrey <> wrote:
> > "Lucy::Search::HitCollector::BitCollector" is 40 characters; a lot of people
> > limit their code to 78-80 characters per line, and class names that long start
> > to cause awkward wrappings.  We don't want to have too many of those.
> BitCollector derives from HitCollector and this is something we want
> to preserve? There's really no way that they can't just both derive
> from Collector the way it reads at first and second glance?  

Funny, but the abstract class was originally named "HitCollector" in Lucene
and is now "Collector".  Most everyone was sorry to see "HitCollector"
replaced by "Collector" (which had to be done as part of one of Lucene's
deprecation cycles).

I have no objection to renaming HitCollector to Collector, from which
BitCollector, OffsetCollector, SortCollector, etc. will descend.

> > But QueryParser could also go under Lucy::Search.  Maybe we should try to have
> > all second-level namespacing represent grouping only?  In other words, there
> > would be no instantiable classes with the pattern Lucy::Xxxx -- only
> > Lucy::Xxxx::Xxxx and deeper.
> This would make me happier.  And this would make things easier to
> align with the C scheme, right?

I'm not sure how it would make things easier... can you elucidate?  

But I like the consistency and I'm willing to sacrifice the Huffman coding.

> >   Lucy::Object::Obj
> >     Lucy::Search::Searchable
> >       Lucy::Search::Searcher
> >       Lucy::Search::PolySearcher
> Lucy::Object::Obj
>   Lucy::Search::Searcher
>     Lucy::Search::SimpleSearcher
>     Lucy::Search::PolySearcher

The Lucene analogue to your "SimpleSearcher" (and KinoSearch::Searcher) is
called "IndexSearcher".  I see the rationale behind that name choice now: it's
a single-index Searcher.

Also, I just checked the back history for Lucene.  Searchable was broken out
as an interface from Searcher in order to make implementing remote search
easier.  So, Searcher predates Searchable, and was the original abstract

In this case, following Lucene's example and going with Lucy::Search::Searcher
as the base class and Lucy::Search::IndexSearcher as the primary user class
seems sensible.  (: We won't follow Lucene's example and name all the methods
"search", though. :)

It bothers me a little bit that the the class which will be used more than any
other, Lucy::Search::IndexSearcher, has a somewhat cumbersome name.  But the
improved clarity in the class hierarchy is worth it.

> Could we agree that Lucy::Dir::SubClass should subclass
> Lucy::Dir::Class?  That in general the subclass should add a word in
> front of the class it derives from?  

Like Huffman, I think that's a nice-to-have.  I'm glad that you've articulated
the principle, and I agree that we should seek to apply it when possible, but
it can't be a hard and fast rule.

For example, ORQuery, ANDQuery, NOTQuery and RequiredOptionalQuery all descend
from PolyQuery -- which is very important, because it allows you to walk a
hierarchy comprised of disparate PolyQueries.  However, the proposed naming
scheme implies Lucy::Search::ORPolyQuery, which is no good.

> >   Lucy::Object::Obj
> >     Lucy::Plan::FieldType
> >       Lucy::Plan::TextType
> >         Lucy::Plan::FullTextType
> >         Lucy::Plan::StringType
> Working blindly but consistently:
> Lucy::Object::Obj
>    Lucy::Plan::Type
>      Lucy::Plan::TextType
>         Lucy::Plan::FullTextType
>         Lucy::Plan::StringTextType

I dunno about "StringTextType".  The rationale behind "FullTextType" is that
it supports "full text search", not that it's a "text type" which is "full".

I think I like "Type", though.  I'm a little hesitant because it's generic in
comparison to "FieldType", and the word "type" has other meanings in the
context of programming C or Java that wouldn't interfere in e.g. Perl.
However, looking over the KS code base, I see that I've used "type" instead of
the more wordy "field_type" the vast majority of the time without any
problems... and as with Searcher, I like the way the class hierarchy looks
with "Type" as the base class.

Our full type hierarchy would look something like this, then:


All of those except for BooleanType are in KinoSearch right now; the numeric
types are incomplete and aren't public yet though -- I haven't figured out how
to get them through PostingsWriter intact.

We might eventually add Int8Type and Int16Type for completeness.  We probably
want to support unsigned integer types with a boolean flag rather than
doubling our class count with UInt32Type, UInt64Type, etc, but we'll cross
that bridge later.

In planning for the future, we may want to consider the range of SQL data
types supported by various RDBMS engines, and how we'd support each of those.

> Thanks for dealing rationally with my silly quibbles.  Overall your
> scheme seems workable, and likely an improvement on the current state
> of affairs.

It's not the first time we've collaborated on an improved class hierarchy, and
it won't be the last.

Marvin Humphrey

View raw message