incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marvin Humphrey (JIRA)" <>
Subject [jira] Commented: (LUCY-5) Boilerplater compiler
Date Thu, 12 Mar 2009 21:34:50 GMT


Marvin Humphrey commented on LUCY-5:

> I could use a little more big picture here -- how does this compiler "fit in"?  

C is typically labeled a "procedural" language, but people use it for object
oriented programming all the time, from lightweight applications to projects
like GTK+/Gnome (<>).  One
common, lightweight technique is to store function pointers directly in struct
based objects.  Both KinoSearch 0.1x and Ferret use this approach, as does

However, the methods-as-members approach doesn't scale well.  If you have a
large number of methods, objects become unreasonably large.  Furthermore, you
can't add more virtual methods to the end of a base class struct definition
without changing the struct definitions of all of its subclasses.  That causes
the ABI to break, so compiled extensions blow up.  You are left with the
choice of either breaking backwards compatibility or accepting severe
constraints on core library development.

To solve the bloating issue, methods can be stored in a shared vtable, a la
C++ and Java.  Single inheritance is sufficient for our needs, so we don't
have to worry about "fixups" and such complications.  However, absent a JIT
compiler, we still have the problem of not being able to add virtual methods
to a base class without breaking binary compatibility in its subclasses (see

To address the virtual method ABI problem, we can use what I call the
"inside-out vtable" approach.  Normally, when compiling virtual method
invocations, the compiler hard-codes the offset into the vtable.  This causes
severe runtime memory errors when a compiled extension expects to find
a function pointer with a certain signature at a given hard-coded offset, but
finds something unexpected and incompatible there instead.  However, if we
store the offsets into the vtable as *variables* -- a change which seems to
have minimal/negligible performance impact -- then a compiled extension can
adapt to a new vtable layout presented by a recompiled core.  We still can't
remove methods, rename them, or change their signatures, but we can add new

The primary function of the Boilerplater compiler is to generate "boilerplate"
C code which to supports this OO model.

> Is this analagous to SWIG (used to easily autogenerate bindings in dynamic
> languages X, Y and Z)? 

That's Boilerplater's secondary function.

We already need Boilerplater to parse headers and build up a representation of
the Lucy OO tree (using Boilerplater::Method, Boilerplater::Class, etc), so
that we can generate our "boilerplate" OO support code.  If we're already
doing that much, it's not that hard to add a few additional modules to
autogenerate binding code.

However, the bindings we can generate with Boilerplater are much more powerful
and integrated into our custom OO model than what we could achieve with SWIG.
SWIG bindings allow you to invoke the C library from the host via wrappers.
Bindings generated by Boilerplater, on the other hand, allow you to write
subclasses entirely in the host language which override methods defined in the
C core.

When you create a pure-Perl subclass, e.g. "MockScorer", a lookup is performed
against the VTable_registry hash to see whether a VTable object exists which
corresponds to that class name.  If not, we dupe the parent's class's VTable,
modifing the dupe by swapping out its class name and storing a reference to the
new parent.  Then we walk the Perl symbol table for "MockScorer" looking for
methods names which match up with the public methods defined by the parent
class Scorer.  For each one that we find, we replace the function pointer at
that slot in the vtable with a custom-tailored function which calls back to
Perl and invokes the pure-Perl method.

> Can you post an example of the output code generated? 

Here's the method-invocation wrapper for Scorer_Next.

extern size_t Lucy_Scorer_Next_OFFSET;
static CHY_INLINE chy_i32_t
Lucy_Scorer_Next(const void *vself)
    lucy_Matcher *const self = (lucy_Matcher*)vself;
    char *const method_address = (char*)self->_ + Lucy_Scorer_Next_OFFSET;
    const lucy_Matcher_next_t method = *((lucy_Matcher_next_t*)method_address);
    return method(self);

Here's the callback which gets installed in the VTable when we discover that
the pure-Perl class "MockScorer" has defined a method named "next".

lucy_Matcher_next(lucy_Matcher* self) 
    return (chy_i32_t)lucy_Native_callback_i(self, "next", 0);

Each binding will have to implement lucy_Native_callback_i() and a few other
methods declared by Native.

> Boilerplater compiler
> ---------------------
>                 Key: LUCY-5
>                 URL:
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Boilerplater
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
> Boilerplater is a small compiler which supports a vtable-based object model.
> The output is C code which adheres to the design that Dave Balmain and I
> hammered out a while back; the input is a collection of ".bp" header files.
> Our original intent was to pepper traditional C ".h" header files with no-op
> macros to define each class's interface; the code generator would understand
> these macros but the C compiler would ignore them.  C source code files would
> then pound-include both the ".h" header and the auxiliary, generated ".bp"
> file.
> The problem with this approach is that C syntax is too constraining.  Because
> C does not support namespacing, every symbol has to be prepended with a prefix
> to avoid conflicts.  Futhermore, adding metadata to declarations (such as
> default values for arguments, or whether NULL is an acceptable value) is
> awkward.  The result is ".h" header files that are excessively verbose,
> cumbersome to edit, and challenging to parse visually and to grok.
> The solution is to make the ".bp" file the master header file, and write it in
> a small, purpose-built, declaration-only language.  The
> code-generator/compiler chews this ".bp" file and spits out a single ".h"
> header file for pound-inclusion in ".c" source code files.
> This isn't really that great a divergence from the original plan.  There's no
> fixed point at which a "code generator" becomes a "compiler", and while the
> declaration-only header language has a few conventions that core developers
> will have to familiarize themselves with, the same was true for the no-op
> macro scheme.  Furthermore, the Boilerplater compiler itself is merely an
> implementation detail; it is not publicly exposed and thus can be modified at
> will.  Users who access Lucy via Perl, Ruby, Java, etc will never see it.
> Even Lucy's C users will never see it, because the public C API itself will be
> defined by a lightweight binding and generated documentation.
> The important thing for us to focus on is the *output* code generated by
> Boilerplater.  We must nail the object model.  It has to be fast.  It has to
> live happily as a symbiote within each host.  It has to support callbacks into
> the host language, so that users may define custom subclasses and override
> methods easily.  It has to present a robust ABI that makes it possible to
> recompile an updated core without breaking compiled extensions (like Java,
> unlike C++).  
> The present implementation of the Boilerplater compiler is a collection of
> Perl modules: Boilerplater::Type, Boilerplater::Variable,
> Boilerplater::Method, Boilerplater::Class, and so on.  One CPAN module is
> required, Parse::RecDescent; however, only core developers will need either
> Perl or Parse::RecDescent, since public distributions of Lucy will 
> contain pre-generated code.  Some of Boilerplater's modules have kludgy 
> internals, but on the whole they seem to do a good job of throwing errors rather 
> than failing subtly.
> I expect to submit individual Boilerplater modules using JIRA sub-issues which
> reference this one, to allow room for adequate commentary.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message