lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Wellnhofer <wellnho...@aevum.de>
Subject Re: [lucy-dev] OFFSET globals
Date Fri, 20 Apr 2012 10:02:40 GMT
On 19/04/2012 23:55, Marvin Humphrey wrote:
> On Thu, Apr 19, 2012 at 9:12 AM, Nick Wellnhofer<wellnhofer@aevum.de>  wrote:
>> Attached is a patch that replaces the OFFSET globals with NUM #defines which
>> contain the method number. This more than halves the number of extern
>> symbols in Lucy.so. The only cost is a constant add for every method
>> invocation which should be negligible.
>>
>> This also removes the use of offsetof in parcel.c, so we don't need the
>> definition of the VTable struct in that file.
>>
>> Thoughts?
>
> Unfortunately, applying this patch would severely constrain the development of
> the Lucy core.  Because it freezes the layout of our vtables by hard-coding
> the offsets at which method pointers are located, we would not be able to add
> new methods to Lucy (among other problems) without breaking compiled
> extensions.

But that's exactly what we do right now. The offsets are never changed. 
If we want binary compatibility across different versions, a compiled 
extension would have to create the offsets at run-time.

> Here's what we want to avoid:
>
>      http://s.apache.org/hzY
>
>      The idea is that if you install an independent third-party compiled
>      extension like "LucyX::RTree", it should still work after you upgrade the
>      Lucy core.
>
>      Using Perl/CPAN as an example, consider the following sequence of events:
>
>      1. Install Lucy 1.00 via CPAN.
>      2. Install LucyX::RTree via CPAN.
>      3. Upgrade Lucy to version 1.01 via CPAN.
>
>      If we do not preserve Lucy's binary compatibility from version 1.00 to
>      1.01, apps which use LucyX::RTree will suddenly start crashing hard
>      immediately after the upgrade finishes. That's not acceptable.
>
> Here's the mechanism by which ABI compat breaks:
>
>      Say that we have a core class "Dog" with two methods, bark() and bite(),
>      and an externally compiled subclass "Boxer" which overrides bark() and
>      adds drool().
>
>          Dog_vtable = {
>              Dog_bark,
>              Dog_bite
>          };
>
>          Boxer_vtable = {
>              Boxer_bark,
>              Dog_bite,
>              Boxer_drool
>          };
>
>      Now say that we add eat(Food *food) to the base class Dog:
>
>          Dog_vtable = {
>              Dog_bark,
>              Dog_bite,
>              Dog_eat
>          };
>
>      Unfortunately, the externally compiled Boxer_vtable has a fixed layout,
>      and it puts Boxer_drool in the slot where the core expects to find eat().
>      When the core tries to call eat() on a Boxer object, chaos will ensue.
>
> Here is the rationale for those OFFSET globals as the solution:
>
>      http://s.apache.org/pSd
>
>      To address the virtual method ABI problem, we can use what I call the
>      "inside-out vtable" approach. Normally, when compiling virtual method
>      invocations, the compiler hard-codes the offset into the vtable. This
>      causes severe runtime memory errors when a compiled extension expects to
>      find a function pointer with a certain signature at a given hard-coded
>      offset, but finds something unexpected and incompatible there instead.
>      However, if we store the offsets into the vtable as variables – a change
>      which seems to have minimal/negligible performance impact – then a
>      compiled extension can adapt to a new vtable layout presented by a
>      recompiled core. We still can't remove methods, rename them, or change
>      their signatures, but we can add new ones.
>
>      http://s.apache.org/hzY
>
>      The "inside-out" aspect of using individual variables to hold the offsets
>      was inspired by the "inside-out object" technique drawn from Perl culture.
>      However, the idea of using variable vtable offsets has been studied
>      before, and is actually implemented in GCJ.
>
>      See "Supporting Binary Compatibility with Static Compilation" by Dachuan
>      Yu, Zhong Shao, and Valery Trifonov, at
>      http://www.usenix.org/events/javavm02/yu/yu_html/index.html.
>
> I have occasionally fantasized about writing a JIT for Clownfish to get rid
> of those OFFSET globals: something along the lines of what Dachuan Yu et al
> propose in their paper.  However, their mechanism involves requires knowledge
> of the class implementation, which in Clownfish's case resides in C code that
> CFC doesn't have access to.

AFAICS, we'll need something like the mechanism described in the paper. 
In the Dog/Boxer example above, the Boxer VTable could be intialized 
like this:

     BOXER = VTable_allocate(DOG->vt_alloc_size + extra_space)
     memcpy(BOXER->methods, DOG->methods, dog_methods_size)
     Boxer_Bark_OFFSET  = Dog_Bark_OFFSET;
     Boxer_Bite_OFFSET  = Dog_Bite_OFFSET;
     Boxer_Drool_OFFSET = DOG->vt_alloc_size;
     VTable_Override(BOXER, Boxer_bark,  Boxer_Bark_OFFSET);
     VTable_Override(BOXER, Boxer_drool, Boxer_Drool_OFFSET);

This shouldn't be too hard to implement.

Nick

Mime
View raw message