lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-dev] OFFSET globals
Date Thu, 19 Apr 2012 21:55:55 GMT
On Thu, Apr 19, 2012 at 9:12 AM, Nick Wellnhofer <wellnhofer@aevum.de> wrote:
> Attached is a patch that replaces the OFFSET globals with NUM #defines which
> contain the method number. This more than halves the number of extern
> symbols in Lucy.so. The only cost is a constant add for every method
> invocation which should be negligible.
>
> This also removes the use of offsetof in parcel.c, so we don't need the
> definition of the VTable struct in that file.
>
> Thoughts?

Unfortunately, applying this patch would severely constrain the development of
the Lucy core.  Because it freezes the layout of our vtables by hard-coding
the offsets at which method pointers are located, we would not be able to add
new methods to Lucy (among other problems) without breaking compiled
extensions.

Here's what we want to avoid:

    http://s.apache.org/hzY

    The idea is that if you install an independent third-party compiled
    extension like "LucyX::RTree", it should still work after you upgrade the
    Lucy core.

    Using Perl/CPAN as an example, consider the following sequence of events:

    1. Install Lucy 1.00 via CPAN.
    2. Install LucyX::RTree via CPAN.
    3. Upgrade Lucy to version 1.01 via CPAN.

    If we do not preserve Lucy's binary compatibility from version 1.00 to
    1.01, apps which use LucyX::RTree will suddenly start crashing hard
    immediately after the upgrade finishes. That's not acceptable.

Here's the mechanism by which ABI compat breaks:

    Say that we have a core class "Dog" with two methods, bark() and bite(),
    and an externally compiled subclass "Boxer" which overrides bark() and
    adds drool().

        Dog_vtable = {
            Dog_bark,
            Dog_bite
        };

        Boxer_vtable = {
            Boxer_bark,
            Dog_bite,
            Boxer_drool
        };

    Now say that we add eat(Food *food) to the base class Dog:

        Dog_vtable = {
            Dog_bark,
            Dog_bite,
            Dog_eat
        };

    Unfortunately, the externally compiled Boxer_vtable has a fixed layout,
    and it puts Boxer_drool in the slot where the core expects to find eat().
    When the core tries to call eat() on a Boxer object, chaos will ensue.

Here is the rationale for those OFFSET globals as the solution:

    http://s.apache.org/pSd

    To address the virtual method ABI problem, we can use what I call the
    "inside-out vtable" approach. Normally, when compiling virtual method
    invocations, the compiler hard-codes the offset into the vtable. This
    causes severe runtime memory errors when a compiled extension expects to
    find a function pointer with a certain signature at a given hard-coded
    offset, but finds something unexpected and incompatible there instead.
    However, if we store the offsets into the vtable as variables – a change
    which seems to have minimal/negligible performance impact – then a
    compiled extension can adapt to a new vtable layout presented by a
    recompiled core. We still can't remove methods, rename them, or change
    their signatures, but we can add new ones.

    http://s.apache.org/hzY

    The "inside-out" aspect of using individual variables to hold the offsets
    was inspired by the "inside-out object" technique drawn from Perl culture.
    However, the idea of using variable vtable offsets has been studied
    before, and is actually implemented in GCJ.

    See "Supporting Binary Compatibility with Static Compilation" by Dachuan
    Yu, Zhong Shao, and Valery Trifonov, at
    http://www.usenix.org/events/javavm02/yu/yu_html/index.html.

I have occasionally fantasized about writing a JIT for Clownfish to get rid
of those OFFSET globals: something along the lines of what Dachuan Yu et al
propose in their paper.  However, their mechanism involves requires knowledge
of the class implementation, which in Clownfish's case resides in C code that
CFC doesn't have access to.

Marvin Humphrey

Mime
View raw message