lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-dev] Dependency injection and customizing a scorer
Date Mon, 09 May 2011 20:31:51 GMT
On Sun, May 08, 2011 at 10:34:37PM -0700, Nathan Kurz wrote:
> The straightforward but brutish option would be to make a one
> character change in ORScorer_score() in ORMatcher.c[*]:
> - score += scores[i];
> + score *= scores[i];
> Recompile, and I'm done.  Simple and fast, but requires doing it in C
> and would require distribution by patch and pray.
>
> If instead I wanted to do it in a Host language, it looks like
> Clownfish provides this quite directly:  define a host function, and
> then use VTable_override to replace the entry in the 'master' vtable
> in the registry for ORMATCHER.  Everything else runs in C, and my
> function gets called in Perl, Python, or Ruby.   Really cool!

It would be a Bad Idea to replace an entry in the ORMATCHER VTable.  VTable
objects, once exposed, must not be modified in any way.  Instead, it's
important to create a *copy* of ORMATCHER using VTable_singleton() and replace
an entry in that.

In practice, though, you just write a pure Perl subclass and Lucy does all the
work behind the scenes, creating the subclass VTable with VTable_singleton(),
installing a callback function pointer for every method you've overridden, then
finally storing the new VTable in the VTable_registry.

> Or maybe this isn't possible yet.  Is it? 

It's possible to override ORScorer's Score() method in a subclass.  Demo code
is below my sig.  Here's the output, which shows that that the custom
MyORScorer::score() subroutine defined in Perl (which always returns a score of
1.0) was invoked:

    $ perl -Mblib custom_or_scorer.pl 
    STANDARD:
       5.09999990463257
       10
       100
    CUSTOM:
       1
       1
       1
    $

>  And is there an established stage where I would do fixups like this?  Is
>  there a callback to do it, or does one stick it in somewhere before the
>  search?  Is the registry global, or per Searcher?

I think many of your questions would be answered by working through
Lucy::Docs::Cookbook::CustomQuery.

> But let's assume that for some reason I need to store a little extra
> data, and thus want to create an entire MyORScorer object and have it
> used instead of OrScorer[**].    It seems like I would have to modify
> ORCompiler_make_matcher() in ORQuery.c.[*]  

Yes, that's right.

I think we have two separate problems:

  1. It's difficult to integrate custom Query/Matcher classes into Lucy.
  2. It's hard to make minor mods to Lucy::Search::ORScorer without resorting
     to heinous monkey patching.

I think it's important to solve the custom-Query-integration problem but not
the ORScorer-subclassing problem.

> But it seems error prone to have to rewrite that whole function in some
> other language.

Writing extensible classes which are designed to be subclassed by users is a
very difficult computer science problem in general.  The principle of
encapsulation dictates that implementation details are supposed to be private,
but it is challenging to write parent classes where the implementation details
are completely hidden from child classes.

Lucy::Search::ORScorer had definitely not been written to be subclassed.
ORScorer and ORMatcher are Lucy's worst search time bottlenecks.  They've been
optimized and unrolled to make them as fast as possible.  ORScorer_score()
accesses C member variables which are not accessible from the Perl layer.  It
wouldn't be easy to make a slight modification.

> Since we already have a registry in place, perhaps we could institute
> a simple level of Dependency Injection
> (http://en.wikipedia.org/wiki/Dependency_injection) that would make
> this easier?  Instead of hard coding it, could we add a function entry
> in OrQuery.cfh for or_scorer_new() and let it be easily overridden?

Our subclassing interface is the way to go here.  It's a lot easier to use
than an interface exposing a raw C function pointer would be.

> Also, presume I want to prototype that MyORScorer class in the host
> language rather than C. How do I create the C callable VTable for this
> Host class?  I think all the pieces are there, but I'm not seeing
> quite how it happens.

Again, Lucy::Docs::Cookbook::CustomQuery should be helpful in illustrating how
it all fits together.  But you might also be interested in gory details.

Try building Lucy...

    cd lucy/perl/
    perl Build.PL
    ./Build code

... and then open up the file autogen/Lucy/Search/Matcher.c.  It will contain
the following function definition:

    float
    lucy_Matcher_score_OVERRIDE(lucy_Matcher* self) {
        return (float)cfish_Host_callback_f64(self, "score", 0); 
    }

A pointer to that function is what gets inserted into the dynamically created
VTable for the pure-Perl subclass MyORScorer.

(I'm guessing that file might have been a missing piece for you, since it's
generated at build time, not stored in the repository.)

> Finally, let's assume I like my prototype, and then write it in C and
> compile it as a shared library.  How does this get loaded and linked
> in?  What do I need to do to register its VTable?

Here's some code taken from TestFileHandle.c which creates a subclass of
FileHandle and overrides FileHandle's Close() method to do nothing:

    static void
    S_no_op_method(const void *vself) {
        UNUSED_VAR(vself);
    }

    static FileHandle*
    S_new_filehandle() {
        ZombieCharBuf *klass = ZCB_WRAP_STR("MyFileHandle", 12);
        FileHandle *fh;
        VTable *vtable = VTable_fetch_vtable((CharBuf*)klass);
        if (!vtable) {
            vtable = VTable_singleton((CharBuf*)klass, FILEHANDLE);
        }   
        VTable_Override(vtable, S_no_op_method, Lucy_FH_Close_OFFSET);
        fh = (FileHandle*)VTable_Make_Obj(vtable);
        return FH_do_open(fh, NULL, 0); 
    }

Pay special attention to the VTable_Override() method invocation, which stores
a pointer to the S_no_op_method() static function at Lucy_FH_Close_OFFSET bytes
into the VTable.

I think you'll agree that the interface needs work. :)  We're not ready to
expose a public C API yet.

Marvin Humphrey


################################################

use Lucy;
use LucyX::Search::MockMatcher;

package MyORScorer;
use base qw( Lucy::Search::ORScorer );

sub score { 1.0 }

package main;

my $standard_or_scorer = Lucy::Search::ORScorer->new(
    children => [
        LucyX::Search::MockMatcher->new(
            doc_ids => [ 1,   2 ],
            scores  => [ 5.0, 10.0 ],
        ),  
        LucyX::Search::MockMatcher->new(
            doc_ids => [ 1,   3 ],
            scores  => [ 0.1, 100.0 ],
        ),  
    ]   
);  
my $custom_or_scorer = MyORScorer->new(
    children => [
        LucyX::Search::MockMatcher->new(
            doc_ids => [ 1,   2 ],
            scores  => [ 5.0, 10.0 ],
        ),  
        LucyX::Search::MockMatcher->new(
            doc_ids => [ 1,   3 ],
            scores  => [ 0.1, 100.0 ],
        ),  
    ]   
);  
print "STANDARD:\n";
while ($standard_or_scorer->next) {
    print "   " . $standard_or_scorer->score . "\n";
}   
print "CUSTOM:\n";
while ($custom_or_scorer->next) {
    print "   " . $custom_or_scorer->score . "\n";
}   



Mime
View raw message