incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Kurz <n...@verse.com>
Subject Re: [lucy-dev] Dependency injection and customizing a scorer
Date Thu, 19 May 2011 20:15:47 GMT
On Mon, May 9, 2011 at 1:31 PM, Marvin Humphrey <marvin@rectangular.com> wrote:
> On Sun, May 08, 2011 at 10:34:37PM -0700, Nathan Kurz wrote:
>
> Lucy::Search::ORScorer had definitely not been written to be subclassed.
> ORScorer and ORMatcher are Lucy's worst search time bottlenecks.  They've been
> optimized and unrolled to make them as fast as possible.  ORScorer_score()
> accesses C member variables which are not accessible from the Perl layer.  It
> wouldn't be easy to make a slight modification.

It's possible that my bombastic tone has gotten in the way of my
point.  I'll try to reduce the ranting to a minimum, and will strive
not to slobber unless I get excited.

I have no complaints with the way that ORScorer is written.  Since it
has no external dependencies, I think it's fine the way it is.  My
concern is how the particular implementation of ORScorer is selected
by the levels above it.  Currently, the entire hierarchy is
hard-coded, so any changes require subclassing all the way from the
top down to the point of change.

> In practice, though, you just write a pure Perl subclass and Lucy does all the
> work behind the scenes, creating the subclass VTable with VTable_singleton(),
> installing a callback function pointer for every method you've overridden, then
> finally storing the new VTable in the VTable_registry.

Yes, this is fantastic, and I want to make sure that our new readers
like David and Joe understand just how slick this is, and how easy
this makes prototyping.  I'm not suggesting that we change this,
rather that with a small amount of additional work we can turn it from
a "neat trick" to "mindblowing feature".  The ability to override the
C library function-by-function in the host language of your choice is
going to make Lucy unstoppable.

> Writing extensible classes which are designed to be subclassed by users is a
> very difficult computer science problem in general.

Yes, it's a hard problem, but over the last 10 years or so there has
been a lot of progress on best practices under the general rubric of
"Dependency Injection".      Particularly for a hybrid of compiled and
interpreted code, we need a runtime means of choosing which
implementation will perform a given service according to a fixed
interface.  I think it's worth debating the means by which to solve
this problem, but  we definitely benefit by improving the current
approach.   But again, the problem isn't the ease of subclassing
(which is great) rather the inability to substitute dependencies.

> I think we have two separate problems:
>
>  1. It's difficult to integrate custom Query/Matcher classes into Lucy.
>  2. It's hard to make minor mods to Lucy::Search::ORScorer without resorting
>     to heinous monkey patching.
>
> I think it's important to solve the custom-Query-integration problem but not
> the ORScorer-subclassing problem.

It's not just Query/Matcher, and certainly not just ORScorer.   I'm
really arguing for the the simple rule "Avoid fixed dependencies."
They make testing difficult, and they negate much of the potential of
host language prototyping.   Yes, we want to avoid "monkey patching",
but the problematic part is the "monkey" not the "patching".   There
are many ways to do dependency injection, but the existing Registry
looks to be inches away from allowing us to do some very sane patching
without the risky monkey business.  I think it requires only a minor
mindset change from "this class always uses Lucy::Search::ORScorer,
which is a compiled class and hence immutable" to "look up ORSCORER in
registry and use the runtime defined class you find".

> It would be a Bad Idea to replace an entry in the ORMATCHER VTable.  VTable
> objects, once exposed, must not be modified in any way.  Instead, it's
> important to create a *copy* of ORMATCHER using VTable_singleton() and replace
> an entry in that.

It doesn't really matter much to me whether individual functions
within ORMATCHER can be changed, or whether the entire entry needs to
be swapped out and replaced.  What is important is that there be some
stage at which the "label" ORMATCHER can safely be changed to point to
a different "entry".  I'm hoping there is room for a setup stage where
this is possible before the VTable is "exposed".

I would like to know more where the badness comes from, though.  I'd
prefer to allow entries to be modified function-by-function, as I
think that could lower the bar for host language work even further,
and might make it easier to do overrides from C or other compiled
languages without requiring the use of the full Clownfish
infrastructure.  I'm fuzzy, but I thought that eliminating 'final'
methods made this possible?

> Our subclassing interface is the way to go here.  It's a lot easier to use
> than an interface exposing a raw C function pointer would be.

The subclassing is not the problem:  it's the difficulty of using the
subclass once one has created it.  My goal would be to make it
possible to switch to a different ORScorer (again, arbitrary example)
at runtime without having to fully understand the full class hierarchy
or Clownfish.  By slightly repurposing the existing Registry, I think
this would be possible quite easily.  The only other thing required is
to convince you that it's worth moving what I feel is the current best
practice of avoiding static dependencies between classes.  :)

Your phrase "raw C function pointer" makes me think that I haven't
conveyed my point well.  Going back to the concrete, I'm suggesting
that the VTable for Lucy::Search::ORCompiler would have a VTable entry
such as "make_scorer" that would initially be set to "ORScorer_new".
This could be done as a one line function that just calls
ORScorer_new, or if possible by just having the VTable point to the
actual function in ORScorer.   I don't see it as being a raw C
function pointer any more than any other member function within
Clownfish.

> Again, Lucy::Docs::Cookbook::CustomQuery should be helpful in illustrating how
> it all fits together.  But you might also be interested in gory details.
>
> (I'm guessing that file might have been a missing piece for you, since it's
> generated at build time, not stored in the repository.)

I apologize:  yes, I'm missing a lot.   A faulty neighborhood electric
system peaking floating to 140 V seems to have fried my motherboard,
and the quickest solution was to switch to a  Windows machine.  I last
used Windows under NT, and barely at that.  As a result, I don't
currently have a development environment, am flailing at what should
be simple tasks, am relying overly on faulty memory, and use the web
interface to SVN to browse the files.   And I'm just generally out of
it.  I may never be be fully coherent again, but I'm confident I'll
sound less loony as soon as I have time to reinstall and configure
Linux.

> I think you'll agree that the interface needs work. :)  We're not ready to
> expose a public C API yet.

Actually the C API doesn't seem that bad.  If it were documented, it
seems like a fine place to start.

--nate

Mime
View raw message