incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Native callbacks
Date Tue, 22 Apr 2008 05:01:32 GMT

The VTable-based method dispatch system devised for Lucy is much, much  
faster than the hash-based dispatch systems typical of dynamic  
languages.  However, since it is a parallel system, overriding an  
internal Lucy method using the host language's native OO mechanism  
presents problems.

   * How do we design a generic callback mechanism for invoking
     object methods which is portable across multiple host environments?
   * How does Lucy know that the user wants to override a method?
   * Will the native method prove unacceptably slow for inner loop code?

We can't do anything about the last problem, but it only affects  
performance-critical inner loops, and even then a system that doesn't  
scale well may still be useful for smaller collections or rapid  
prototyping.  As for the first two problems, I believe I have at least  
partial solutions devised.

If we implement Lucy-level abstract methods as functions which call  
back to the host language using instance-method semantics, then Lucy  
will "see" the correct method even if it has been overridden multiple  
times at the host-language level.

   /* Create the appropriate wrapper around [self], call "get_doc_num"
    * on the wrapper, convert the callback's return value to an integer,
    * and return that integer to the C-level invocant.
   Scorer_get_doc_num(Scorer *self)
       return Native_callback_i(self, "get_doc_num", 0);

I'm using this technique all over KinoSearch and it has proven quite  
successful.  For instance, Scorer is spec'd out at the C level, but  
I've been able to build a pure-Perl MockScorer subclass, and a user  
has even released a pure-Perl WildCardQuery implementation to CPAN.

Abstract callbacks require a couple of tricks, and they aren't perfect.

First... the callback technique works fine when you are invoking the  
method from inside the Lucy C core, but...  What if you want to invoke  
a method via the host language that *should* have been overridden, but  
might not have been?  You can end up in an infinite loop with the  
callback invoking the binding invoking the callback and so on.

The solution is to insert an ABSTRACT_METHOD_CHECK in the binding code  
before the vtable-method invocation.  The test assesses whether the  
function pointer in the vtable matches the address of the original  
implementing function.

   * If it matches, we'll get an infinite loop, so throw an error.
   * If it doesn't match, then the method has been overridden at
     the C level and it's safe to invoke.
   * If the method was overridden at the host-language level... well,
     this scenario never comes into play, because the original
     binding calling into C has been overridden.

Second... there are a lot of details about how you implement various  
Native_callback_xxxxx functions to handle different kinds of  
arguments... but we'll save that for another post.

Third... Say that you want to subclass not Scorer, but *TermScorer*,  
and you try to override TermScorer_Get_Doc_Num() via the host-language  
OO mechanism.  The problem is that the function pointer in  
TermScorer's VTable for Get_Doc_Num doesn't call back to the host  
language -- so it never finds out that you've tried to override it.   
You'll get the native override when invoking from the host language,  
but not when invoking from within the library via the VTable.

Unfortunately, I haven't thought of a solution to this one.  :(  The  
best I can think of is some sort of override technique which stuffs a  
callback function into the subclass's VTable.

   package MyTermScorer;
   use base qw( Lucy::Search::TermScorer );
   __PACKAGE__->override(qw( get_doc_num ));

To me, that sounds both fiddly and like an implementation detail  
leaking out.

Nevertheless, the technique of abstract methods calling back to the  
host is so useful that I think we should just live with the drawbacks  
if the can't be resolved.

To implement these abstract callbacks, we need to be able to write a  
header file defining a generic interface which is compatible with  
every target language: probably this header would live at trunk/c_src/ 

Then we need to implement the Native.h interface with different C code  
for each target.  We could potentially break things up with giant  
#ifdef LUCY_RUBY and such within trunk/c_src/Lucy/Util/Native.c, but I  
think that file would grow out of control, as would others like it.   
Instead, I think we should establish a second tree for C code within  
each binding folder.  For the Perl binding, the file would probably  
live at trunk/perl/xs/Lucy/Util/Native.c.

If we can pull this off, it allows to move more code into Lucy's  
shared C core, reducing redundancy -- while simultaneously giving  
users maximum flexibility to innovate in their language of choice.

Marvin Humphrey
Rectangular Research

View raw message