lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Logan Bell <loganb...@gmail.com>
Subject Re: [lucy-dev] Ruby allocation/initialization
Date Mon, 09 Jan 2012 00:26:50 GMT
Very interesting suggestions, I believe the code you supplied will
work perfect. I was dismayed as well that ruby would force us to
change how clownfish objects are constructed, so I did some more code
spelunking to see if there was any other ways around this.

I did a bit more research in how other modules address this issue, and
found that the allocator could generate an empty pointer that could
then later be populated in the initialization function.

An example of this would be the following:

static VALUE cfc_hierarchy_alloc(VALUE klass) {
    VALUE self_rb = Qnil;
    void *ptr = NULL;
    self_rb = Data_Wrap_Struct(klass, NULL, NULL, ptr);
    return self_rb;
}

static VALUE cfc_hierarchy_init(VALUE self_rb, VALUE source, VALUE dest) {
    CFCHierarchy* self;
    Data_Get_Struct(self_rb,CFCHierarchy, self);
    self = CFCHierarchy_new(StringValuePtr(source),StringValuePtr(dest) );
    DATA_PTR(self_rb) = self;
    return self_rb;
}


This pattern I found was being used in the ruby extension:
ext/dbm/dbm.c. Doing it this way would allow us to still leverage the
benefits of having an allocator (not certain if there really is any at
this point). This is another way of at least doing it.

If one uses rb_define_singleton_method, would we have any state issues
later since that class is now a singleton? Or for our purposes, does
it really even matter? Thoughts? I'm not certain at this point which
would be the ideal way of going forward.

Thanks,
Logan


On Sat, Jan 7, 2012 at 12:47 PM, Marvin Humphrey <marvin@rectangular.com> wrote:
> On Thu, Jan 05, 2012 at 11:33:12AM -0800, Logan Bell wrote:
>> In regard to the allocation function and the need to create an empty object
>> has had me digging a bit more in the pickaxe book. The allocator is only
>> needed "if the object you’re implementing doesn’t use any data other than
>> Ruby instance variables, then you don’t need to write an allocation
>> function—Ruby’s default allocator will work just fine. " If I understand
>> that correctly, since our (Clownfish::CFC::Hierarchy) object does need data
>> then we need to allocate the space up front in the allocator function.
>>
>> Further it goes on to outline reasons why this is necessary ( marshaling as
>> you pointed out being one of them ):
>>
>> "One of the reasons for this multistep object creation protocol is that it
>> lets the interpreter handle situations where objects have to be created by
>> “back-door means.” One example is when objects are being deserialized from
>> their marshaled form. Here, the interpreter needs to create an empty object
>> (by calling the allocator), but it cannot call the initializer (because it
>> has no knowledge of the parameters to use). Another common situation is
>> when objects are duplicated or cloned."
>>
>> It might be worth doing some code diving on the ruby end to see for sure,
>> but I can see value in in having constructors that accept no arguments.
>
> Clownfish actually provides a direct analogue to Ruby's Class#allocate:
> VTable_Make_Obj().
>
>    /** Create an empty object of the type defined by the VTable: allocate,
>     * assign its vtable and give it an initial refcount of 1.  The caller is
>     * responsible for initialization.
>     */
>    Obj*
>    Make_Obj(VTable *self);
>
> For an example of how VTable_Make_Obj() is used during deserialization, here's
> Freezer_thaw() from core/Lucy/Util/Freezer.c:
>
>    Obj*
>    Freezer_thaw(InStream *instream) {
>        CharBuf *class_name
>            = CB_Deserialize((CharBuf*)VTable_Make_Obj(CHARBUF), instream);
>        VTable *vtable = VTable_singleton(class_name, NULL);
>        Obj *blank = VTable_Make_Obj(vtable);
>        DECREF(class_name);
>        return Obj_Deserialize(blank, instream);
>    }
>
> Freezer_thaw() obtains the class name, uses it to look up the right VTable
> singleton, then invokes VTable_Make_Obj() to create the blank object.  The
> newborn blank object doesn't start off with much, but at least it has a VTable
> -- so we can invoke the Deserialize() object method on it and flesh it out.
>
> We also use VTable_Make_Obj() for every Lucy object that we create from
> Perl-space.  Our Foo_new() C functions have a limitation: they do not take a
> class name as an argument, so they cannot support dynamic subclassing.  For
> instance, here is Normalizer_new():
>
>    Normalizer*
>    Normalizer_new(const CharBuf *form, bool_t case_fold, bool_t strip_accents) {
>        Normalizer *self = (Normalizer*)VTable_Make_Obj(NORMALIZER);
>        return Normalizer_init(self, form, case_fold, strip_accents);
>    }
>
> Because the VTable is hard-coded to NORMALIZER, objects created via
> Normalizer_new() will *always* have a class of "Lucy::Analysis::Normalizer".
> But what if you create a Perl subclass of Lucy::Analysis::Normalizer called
> "MyNormalizer"?
>
>    package MyNormalizer;
>    use base qw( Lucy::Analysis::Normalizer );
>
>    my $normalizer = MyNormalizer->new;
>
> Here's how Normalizer_new() would need to change in order to support such
> subclassing:
>
>    Normalizer*
>    Normalizer_new(CharBuf *class_name, const CharBuf *form,
>                   bool_t case_fold, bool_t strip_accents) {
>        VTable *vtable = VTable_singleton(class_name, NULL);
>        Normalizer *self = (Normalizer*)VTable_Make_Obj(vtable);
>        return Normalizer_init(self, form, case_fold, strip_accents);
>    }
>
> The actual code which *does* support subclassing for Normalizer is spread
> across three functions, two of which I've included below my sig for reference:
>
>  * XSBind_new_blank_obj() from perl/xs/XSBind.c, which wraps
>    VTable_Make_Obj().
>  * XS_Lucy_Analysis_Normalizer_new() from Lucy.xs, which is auto-generated.
>  * Normalizer_init(), from core/Lucy/Analysis/Normalizer.c.
>
> In order to support dynamic subclassing in the Ruby bindings for Lucy, we will
> need to provide similar functionality.
>
> However, I question whether we need to provide that kind of functionality for
> Clownfish::CFC, which is itself written using a much cruder object model:
>
>  * No support for subclassing.
>  * No support for serialization.
>  * No support for Ruby's #clone or #dup methods.
>
> I don't yet understand why Ruby *needs* an allocator function if we aren't
> going to use those bells and whistles.  How many C libraries out there provide
> two-stage constructors?  It doesn't make sense that Ruby would impose such an
> esoteric requirement, limiting the kinds of C libraries you could write Ruby
> bindings for.
>
> Something like this ought to work:
>
>    // Clownfish::CFC::Hierarchy#new
>    static VALUE
>    S_CFCHierarchy_new(VALUE klass, VALUE source_rb, VALUE dest_rb) {
>        const char *source = StringValuePtr(source_rb);
>        const char *dest   = StringValuePtr(dest_rb);
>        CFCHierarchy *self = CFCHierarchy_new(source, dest);
>        return Data_Wrap_Struct(klass, NULL, NULL, self);
>    }
>
>    // Bootstrap Clownfish::CFC::Hierarchy.
>    static void
>    S_Init_CFCHierarchy() {
>        cHierarchy  = rb_define_class_under(mCFC, "Hierarchy", rb_cObject);
>        rb_define_method(cHierarchy, "build", S_CFCHierarchy_build, 0);
>        rb_define_singleton_method(cHierarchy, "new", S_CFCHierarchy_new, 2);
>    }
>
>    // Bootstrap Clownfish::CFC and all of its components.
>    void
>    Init_CFC() {
>        mClownfish = rb_define_module("Clownfish");
>        mCFC       = rb_define_module_under(mClownfish, "CFC");
>        S_Init_CFCHierarchy();
>    }
>
> I don't know whether that's an idiomatic approach for writing a Ruby extension,
> but if it works, it prevents us from having to add a bunch of CFCFoo_allocate()
> functions and from having to provide two-stage constructors for every
> Clownfish::CFC component.
>
> In any case, exploring this topic for the CFC bindings helps us to understand
> the issues we will confront when auto-generating Ruby wrapper code via the
> as-yet-to-be-written Clownfish::CFC::Binding::Ruby.  :)
>
> Marvin Humphrey
>
>
> cfish_Obj*
> XSBind_new_blank_obj(SV *either_sv) {
>    cfish_VTable *vtable;
>
>    // Get a VTable.
>    if (sv_isobject(either_sv)
>        && sv_derived_from(either_sv, "Lucy::Object::Obj")
>       ) {
>        // Use the supplied object's VTable.
>        IV iv_ptr = SvIV(SvRV(either_sv));
>        cfish_Obj *self = INT2PTR(cfish_Obj*, iv_ptr);
>        vtable = self->vtable;
>    }
>    else {
>        // Use the supplied class name string to find a VTable.
>        STRLEN len;
>        char *ptr = SvPVutf8(either_sv, len);
>        cfish_ZombieCharBuf *klass = CFISH_ZCB_WRAP_STR(ptr, len);
>        vtable = cfish_VTable_singleton((cfish_CharBuf*)klass, NULL);
>    }
>
>    // Use the VTable to allocate a new blank object of the right size.
>    return Cfish_VTable_Make_Obj(vtable);
> }
>
>
> XS(XS_Lucy_Analysis_Normalizer_new) {
>    dXSARGS;
>    CHY_UNUSED_VAR(cv);
>    if (items < 1) { CFISH_THROW(CFISH_ERR, "Usage: %s(class_name, ...)",  GvNAME(CvGV(cv)));
}
>    SP -= items;
>
>    const lucy_CharBuf* normalization_form = NULL;
>    chy_bool_t case_fold = true;
>    chy_bool_t strip_accents = false;
>    chy_bool_t args_ok = XSBind_allot_params(
>        &(ST(0)), 1, items, "Lucy::Analysis::Normalizer::new_PARAMS",
>        ALLOT_OBJ(&normalization_form, "normalization_form", 18, false, LUCY_CHARBUF,
alloca(cfish_ZCB_size())),
>        ALLOT_BOOL(&case_fold, "case_fold", 9, false),
>        ALLOT_BOOL(&strip_accents, "strip_accents", 13, false),
>        NULL);
>    if (!args_ok) {
>        CFISH_RETHROW(CFISH_INCREF(cfish_Err_get_error()));
>    }
>    lucy_Normalizer* self = (lucy_Normalizer*)XSBind_new_blank_obj(ST(0));
>
>    lucy_Normalizer* retval = lucy_Normalizer_init(self, normalization_form, case_fold,
strip_accents);
>    if (retval) {
>        ST(0) = (SV*)Cfish_Obj_To_Host((cfish_Obj*)retval);
>        Cfish_Obj_Dec_RefCount((cfish_Obj*)retval);
>    }
>    else {
>        ST(0) = newSV(0);
>    }
>    sv_2mortal(ST(0));
>    XSRETURN(1);
> }
>
>
>

Mime
View raw message