lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-dev] Dependency injection and customizing a scorer
Date Mon, 09 May 2011 20:40:44 GMT
On Sun, May 08, 2011 at 10:34:37PM -0700, Nathan Kurz wrote:
> [*] It does seem like all the functions I pick are in files of
> different names.   I love the simplicity of Prefix == File.

Prefixes are tied directly to classes, so one-prefix-per-file means
one-class-per-file.  If we were to enforce one class per .cfh/.c file pair,
the number of files under trunk/core/ would grow by 68 pairs (134 files), or
about 30%.

    $ find core -print | grep cfh | wc -l
         229
    $ find core -print | grep cfh | xargs grep "class Lucy" | wc -l
         297

It would also constrain our ability somewhat to share static functions across
classes.  ORMatcher.c contains two classes: ORMatcher, which does not
implement Score(), and and ORScorer, which does.  There are a lot of shared
routines.

Various languages have different approaches to tying classes and files.
    
    * Python has both modules and classes.  Files are explicitly organized
      around modules and *not* classes.
    * Perl has only packages.  You can put multiple packages in the same file,
      but thanks to the way the 'require' and 'use' keywords work, you're
      gonna be in a world of hurt if you don't put each public class in its
      own file.
    * Java enforces one externally visible class per file, but multiple
      private classes within a file are allowed.

Now that you're forcing me to think about it, I prefer the Python paradigm.  I
don't like forcing files to be organized around classes, or user import
interfaces to be organized around classes.  Library authors should be given
the flexibility to define custom module interfaces exposing library
functionality.  Classes aren't special, they're just another tool in the
toolbox.

So, following that reasoning, I think it's important to allow multiple
prefixes in one file.

> [**] OK, I'm getting off track here, but why is OR capitalized?

The "OR" in "ORQuery" represents the boolean operator "OR" -- it doesn't have
semantic meaning as "Or" would in "PrefixOrSuffixMatcher".

The OR*, AND*, and NOT* class names are that way because that's how those
operators are traditionally represented in boolean search query languages.
You might see this...

    foo AND NOT (bar OR baz)

... and possibly this, though it's ambiguous when terms aren't quoted...

    foo and not (bar or baz)

... but I've never seen this:

    foo And Not (bar Or baz)

Marvin Humphrey


Mime
View raw message