lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-dev] OFFSET globals
Date Fri, 20 Apr 2012 17:44:41 GMT
On Fri, Apr 20, 2012 at 3:02 AM, Nick Wellnhofer <wellnhofer@aevum.de> wrote:
> On 19/04/2012 23:55, Marvin Humphrey wrote:
>> Unfortunately, applying this patch would severely constrain the development
>> of the Lucy core.  Because it freezes the layout of our vtables by
>> hard-coding the offsets at which method pointers are located, we would not
>> be able to add new methods to Lucy (among other problems) without breaking
>> compiled extensions.
>
> But that's exactly what we do right now. The offsets are never changed.

Heh.

The original concept was to have one OFFSET var per novel method.  However,
that design was flawed because it caused the ABI to break if the method's
novel declaration moved from one class to another, e.g. up into a parent
class: a compiled extension would be counting on the existence of
Parcel_OldClass_MethodName_OFFSET, but that var would go away with the new
release of the core, breaking ABI compat.

Creating OFFSET vars for every invocant/method-name combo solved that problem,
though at the cost of considerable DLL symbol proliferation.  However, as you
point out, a mechanism to propagate the offsets was never introduced.

> If we want binary compatibility across different versions, a compiled
> extension would have to create the offsets at run-time.

Yes.

>> I have occasionally fantasized about writing a JIT for Clownfish to get rid
>> of those OFFSET globals: something along the lines of what Dachuan Yu et al
>> propose in their paper.  However, their mechanism involves requires
>> knowledge of the class implementation, which in Clownfish's case resides in
>> C code that CFC doesn't have access to.
>
> AFAICS, we'll need something like the mechanism described in the paper. In
> the Dog/Boxer example above, the Boxer VTable could be intialized like this:
>
>    BOXER = VTable_allocate(DOG->vt_alloc_size + extra_space)
>    memcpy(BOXER->methods, DOG->methods, dog_methods_size)
>    Boxer_Bark_OFFSET  = Dog_Bark_OFFSET;
>    Boxer_Bite_OFFSET  = Dog_Bite_OFFSET;
>    Boxer_Drool_OFFSET = DOG->vt_alloc_size;
>    VTable_Override(BOXER, Boxer_bark,  Boxer_Bark_OFFSET);
>    VTable_Override(BOXER, Boxer_drool, Boxer_Drool_OFFSET);
>
> This shouldn't be too hard to implement.

Indeed, this is exactly what we need to complete the existing design.
Excellent diagnosis and prescription, Herr Doktor Wellnhofer. :)

I've considered several other approaches to solving this problem, but I
haven't yet found one that's suitable.

A traditional JIT would transform an intermediate stage of compilation into
final object code.  In our case, the only transformation we want to make is
to update hard-coded vtable offsets at every method invocation site.
However, Clownfish at this point is just an interface description language,
and the C implementation code compiles down to system-specific library formats
like ELF, DLL, etc.  I don't know of a way to treat those as input to a JIT.
:)

In theory, another possibility is to turn method invocations into global
functions which delegate.

    // autogen

    bool_t
    Cfish_Obj_Equals(const cfish_Obj *self, cfish_Obj *other) {
        char *const method_address = *(char**)self
                                   + cfish_VTable_offset_of_methods
                                   + Cfish_Obj_Equals_NUM *
sizeof(cfish_method_t);
        const Cfish_Obj_Equals_t method =
*((Cfish_Obj_Equals_t*)method_address);
        return method(self, other);
    }

    // Obj.c

    bool_t
    Cfish_Obj_Equals_IMPL(cfish_Obj *self, cfish_Obj *other) {
        // ...
    }

However, then we face the same symbol proliferation problem we're currently
facing with our global OFFSET vars: we need one delegator for each
invocant type,
or the ABI will break if the first declaration moves to another class.

Another approach is to use static variables which trigger some sort of
initialization the first time they are invoked, causing the method invocation
behavior to self-modify.

Here's one variant which modifies a static function pointer...

    static chy_bool_t
    Cfish_Obj_Equals_INIT(cfish_Obj *self, cfish_Obj *other);

    static Cfish_Obj_Equals_t Cfish_Obj_Equals = Cfish_Obj_Equals_INIT;

    static chy_bool_t
    Cfish_Obj_Equals_INIT(cfish_Obj *self, cfish_Obj *other) {
        CFish_Obj_Equals
            = (CFish_Obj_Equals_t)Obj_Look_Up_Delegator(self, "equals");
        return Cfish_Obj_Equals(self, other);
    }

... and here's an alternative that uses a static OFFSET var, but
requires an extra
conditional check for each method invocation.

    static size_t Cfish_Obj_Equals_OFFSET = 0;
    static CHY_INLINE chy_bool_t bool_t
    Cfish_Obj_Equals(cfish_Obj *self, cfish_Obj *other) {
        if (Cfish_Obj_Equals_OFFSET == 0) {
            CFish_Obj_Equals_OFFSET
                = (CFish_Obj_Equals_t)Obj_Look_Up_Offset(self, "equals");
        }
        char *const method_address = *(char**)self + Cfish_Obj_Equals_OFFSET;
        const Cfish_Obj_Equals_t method =
*((Cfish_Obj_Equals_t*)method_address);
        return method(self, other);
    }

These self-modifying-init techniques are sort of similar to the Yu proposal, but
have drawbacks:

  * Initialization happens lazily and piecemeal, as opposed to once per
    class file at boot as part of the Java class loader in Yu.
  * AFAICT, there's no way to avoid the overhead of either a delegator or
    an extra conditional for each method invocation.

Marvin Humphrey

Mime
View raw message