incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCY-5) Boilerplater compiler
Date Mon, 16 Mar 2009 10:11:50 GMT


Michael McCandless commented on LUCY-5:

bq. The stub class won't be necessary. You can write your subclass in pure Python.

OK, nice.

> OK so it sounds like calling functions/methods fit into various
> categories:

> * Entirely Lucy internal --> just call the function directly, so
> normal C compilation handles this.

That seems to me like a weird way of putting it, so maybe I'm not grokking
More likely the reverse!

I think the answer is yes - but isn't that true for vtable-based
subclassing in general? Only invocations of "final" methods can be resolved
to a function address at compile-time. All other method invocations have to
go through the double-derefence to find the function address in the vtable.

Sorry, I meant: are there many internal -> internal function calls
(ie, "normal" C function calls)?

EG say Lucy were up and running, and you ran a trace on indexing N
docs, gathering counters for how many times 1) a "normal" C function
was invoked (eg say calling sqrt()), 2) a "dynamic vtable" method was
invoked but the target method was implemented in C, and 3) a "dynamic
vtable" method was invoked that has been implemented in the host
language, so you dispatched to its runtime.

Those are the 3 categories I was wondering about; it sounds like
category 1) is actually rather small in Lucy?  Which means, most APIs
are in theory overridable in the host language?  (The "dynamic vtable"
API surface area is relatively high).

Eg in IndexWriter, Lucene has various internal (private/protected)
methods for doing merging - mergeInit, mergeMiddle, mergeCommit,
mergeFinish, etc. - that are not meant to be overridden.  These would
be category 1).

Yes. Actually, Balmain ultimately persuaded me that the entire core should be
in C. I'm cool with that since Boilerplater makes pure-LanguageX subclassing


> * Lucy invokes "dynamically dispatched" API, and in fact its impl in
> the current context is defined in the host language, so we go
> through the full dynamic dispatch.

I think you and I are using the term "dynamic dispatch" to mean different
things. I'm using it in the sense of "resolved at run-time", so any virtual
method qualifies - including C++ and Java virtual methods, even though C++
and Java aren't typically called "dynamic languages".

I actually intended my usage to be this definition, ie your specific
implementation of dynamic dispatch in Lucy.

> Boilerplater compiler
> ---------------------
>                 Key: LUCY-5
>                 URL:
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Boilerplater
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
> Boilerplater is a small compiler which supports a vtable-based object model.
> The output is C code which adheres to the design that Dave Balmain and I
> hammered out a while back; the input is a collection of ".bp" header files.
> Our original intent was to pepper traditional C ".h" header files with no-op
> macros to define each class's interface; the code generator would understand
> these macros but the C compiler would ignore them.  C source code files would
> then pound-include both the ".h" header and the auxiliary, generated ".bp"
> file.
> The problem with this approach is that C syntax is too constraining.  Because
> C does not support namespacing, every symbol has to be prepended with a prefix
> to avoid conflicts.  Futhermore, adding metadata to declarations (such as
> default values for arguments, or whether NULL is an acceptable value) is
> awkward.  The result is ".h" header files that are excessively verbose,
> cumbersome to edit, and challenging to parse visually and to grok.
> The solution is to make the ".bp" file the master header file, and write it in
> a small, purpose-built, declaration-only language.  The
> code-generator/compiler chews this ".bp" file and spits out a single ".h"
> header file for pound-inclusion in ".c" source code files.
> This isn't really that great a divergence from the original plan.  There's no
> fixed point at which a "code generator" becomes a "compiler", and while the
> declaration-only header language has a few conventions that core developers
> will have to familiarize themselves with, the same was true for the no-op
> macro scheme.  Furthermore, the Boilerplater compiler itself is merely an
> implementation detail; it is not publicly exposed and thus can be modified at
> will.  Users who access Lucy via Perl, Ruby, Java, etc will never see it.
> Even Lucy's C users will never see it, because the public C API itself will be
> defined by a lightweight binding and generated documentation.
> The important thing for us to focus on is the *output* code generated by
> Boilerplater.  We must nail the object model.  It has to be fast.  It has to
> live happily as a symbiote within each host.  It has to support callbacks into
> the host language, so that users may define custom subclasses and override
> methods easily.  It has to present a robust ABI that makes it possible to
> recompile an updated core without breaking compiled extensions (like Java,
> unlike C++).  
> The present implementation of the Boilerplater compiler is a collection of
> Perl modules: Boilerplater::Type, Boilerplater::Variable,
> Boilerplater::Method, Boilerplater::Class, and so on.  One CPAN module is
> required, Parse::RecDescent; however, only core developers will need either
> Perl or Parse::RecDescent, since public distributions of Lucy will 
> contain pre-generated code.  Some of Boilerplater's modules have kludgy 
> internals, but on the whole they seem to do a good job of throwing errors rather 
> than failing subtly.
> I expect to submit individual Boilerplater modules using JIRA sub-issues which
> reference this one, to allow room for adequate commentary.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message