Return-Path: X-Original-To: apmail-incubator-lucy-dev-archive@www.apache.org Delivered-To: apmail-incubator-lucy-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B4847BAF4 for ; Mon, 9 Jan 2012 00:27:19 +0000 (UTC) Received: (qmail 24357 invoked by uid 500); 9 Jan 2012 00:27:19 -0000 Delivered-To: apmail-incubator-lucy-dev-archive@incubator.apache.org Received: (qmail 24329 invoked by uid 500); 9 Jan 2012 00:27:18 -0000 Mailing-List: contact lucy-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@incubator.apache.org Delivered-To: mailing list lucy-dev@incubator.apache.org Received: (qmail 24321 invoked by uid 99); 9 Jan 2012 00:27:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jan 2012 00:27:18 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of loganbell@gmail.com designates 209.85.214.47 as permitted sender) Received: from [209.85.214.47] (HELO mail-bk0-f47.google.com) (209.85.214.47) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jan 2012 00:27:11 +0000 Received: by bkbzt4 with SMTP id zt4so1227698bkb.6 for ; Sun, 08 Jan 2012 16:26:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=YlhXg9/3jkvP3+qZw2klYAk+KzVLcdDUfS/LjRtloGg=; b=LXdPfXxvGa/NAh+jeoL1I27p0ksLsWJ3LOMgTPce28TginVr+TjsivthXETetL8FN8 P78YMdK8wzAhPKKh/cd9i2JIWP60GYJTdfS5eoGtzBpOsCa+UHebpI0VudVoTUPKrkQv AuVOpnOqz7PMqZmqQ59TSFzRlkUtgatEEpbTQ= MIME-Version: 1.0 Received: by 10.204.128.77 with SMTP id j13mr6471843bks.124.1326068810694; Sun, 08 Jan 2012 16:26:50 -0800 (PST) Received: by 10.204.58.138 with HTTP; Sun, 8 Jan 2012 16:26:50 -0800 (PST) In-Reply-To: <20120107204732.GA30176@rectangular.com> References: <20120105073824.93378238890B@eris.apache.org> <20120105184129.GA30433@rectangular.com> <20120107204732.GA30176@rectangular.com> Date: Sun, 8 Jan 2012 16:26:50 -0800 Message-ID: From: Logan Bell To: lucy-dev@incubator.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Subject: Re: [lucy-dev] Ruby allocation/initialization Very interesting suggestions, I believe the code you supplied will work perfect. I was dismayed as well that ruby would force us to change how clownfish objects are constructed, so I did some more code spelunking to see if there was any other ways around this. I did a bit more research in how other modules address this issue, and found that the allocator could generate an empty pointer that could then later be populated in the initialization function. An example of this would be the following: static VALUE cfc_hierarchy_alloc(VALUE klass) { VALUE self_rb =3D Qnil; void *ptr =3D NULL; self_rb =3D Data_Wrap_Struct(klass, NULL, NULL, ptr); return self_rb; } static VALUE cfc_hierarchy_init(VALUE self_rb, VALUE source, VALUE dest) { CFCHierarchy* self; Data_Get_Struct(self_rb,CFCHierarchy, self); self =3D CFCHierarchy_new(StringValuePtr(source),StringValuePtr(dest) )= ; DATA_PTR(self_rb) =3D self; return self_rb; } This pattern I found was being used in the ruby extension: ext/dbm/dbm.c. Doing it this way would allow us to still leverage the benefits of having an allocator (not certain if there really is any at this point). This is another way of at least doing it. If one uses rb_define_singleton_method, would we have any state issues later since that class is now a singleton? Or for our purposes, does it really even matter? Thoughts? I'm not certain at this point which would be the ideal way of going forward. Thanks, Logan On Sat, Jan 7, 2012 at 12:47 PM, Marvin Humphrey w= rote: > On Thu, Jan 05, 2012 at 11:33:12AM -0800, Logan Bell wrote: >> In regard to the allocation function and the need to create an empty obj= ect >> has had me digging a bit more in the pickaxe book. The allocator is only >> needed "if the object you=92re implementing doesn=92t use any data other= than >> Ruby instance variables, then you don=92t need to write an allocation >> function=97Ruby=92s default allocator will work just fine. " If I unders= tand >> that correctly, since our (Clownfish::CFC::Hierarchy) object does need d= ata >> then we need to allocate the space up front in the allocator function. >> >> Further it goes on to outline reasons why this is necessary ( marshaling= as >> you pointed out being one of them ): >> >> "One of the reasons for this multistep object creation protocol is that = it >> lets the interpreter handle situations where objects have to be created = by >> =93back-door means.=94 One example is when objects are being deserialize= d from >> their marshaled form. Here, the interpreter needs to create an empty obj= ect >> (by calling the allocator), but it cannot call the initializer (because = it >> has no knowledge of the parameters to use). Another common situation is >> when objects are duplicated or cloned." >> >> It might be worth doing some code diving on the ruby end to see for sure= , >> but I can see value in in having constructors that accept no arguments. > > Clownfish actually provides a direct analogue to Ruby's Class#allocate: > VTable_Make_Obj(). > > =A0 =A0/** Create an empty object of the type defined by the VTable: allo= cate, > =A0 =A0 * assign its vtable and give it an initial refcount of 1. =A0The = caller is > =A0 =A0 * responsible for initialization. > =A0 =A0 */ > =A0 =A0Obj* > =A0 =A0Make_Obj(VTable *self); > > For an example of how VTable_Make_Obj() is used during deserialization, h= ere's > Freezer_thaw() from core/Lucy/Util/Freezer.c: > > =A0 =A0Obj* > =A0 =A0Freezer_thaw(InStream *instream) { > =A0 =A0 =A0 =A0CharBuf *class_name > =A0 =A0 =A0 =A0 =A0 =A0=3D CB_Deserialize((CharBuf*)VTable_Make_Obj(CHARB= UF), instream); > =A0 =A0 =A0 =A0VTable *vtable =3D VTable_singleton(class_name, NULL); > =A0 =A0 =A0 =A0Obj *blank =3D VTable_Make_Obj(vtable); > =A0 =A0 =A0 =A0DECREF(class_name); > =A0 =A0 =A0 =A0return Obj_Deserialize(blank, instream); > =A0 =A0} > > Freezer_thaw() obtains the class name, uses it to look up the right VTabl= e > singleton, then invokes VTable_Make_Obj() to create the blank object. =A0= The > newborn blank object doesn't start off with much, but at least it has a V= Table > -- so we can invoke the Deserialize() object method on it and flesh it ou= t. > > We also use VTable_Make_Obj() for every Lucy object that we create from > Perl-space. =A0Our Foo_new() C functions have a limitation: they do not t= ake a > class name as an argument, so they cannot support dynamic subclassing. = =A0For > instance, here is Normalizer_new(): > > =A0 =A0Normalizer* > =A0 =A0Normalizer_new(const CharBuf *form, bool_t case_fold, bool_t strip= _accents) { > =A0 =A0 =A0 =A0Normalizer *self =3D (Normalizer*)VTable_Make_Obj(NORMALIZ= ER); > =A0 =A0 =A0 =A0return Normalizer_init(self, form, case_fold, strip_accent= s); > =A0 =A0} > > Because the VTable is hard-coded to NORMALIZER, objects created via > Normalizer_new() will *always* have a class of "Lucy::Analysis::Normalize= r". > But what if you create a Perl subclass of Lucy::Analysis::Normalizer call= ed > "MyNormalizer"? > > =A0 =A0package MyNormalizer; > =A0 =A0use base qw( Lucy::Analysis::Normalizer ); > > =A0 =A0my $normalizer =3D MyNormalizer->new; > > Here's how Normalizer_new() would need to change in order to support such > subclassing: > > =A0 =A0Normalizer* > =A0 =A0Normalizer_new(CharBuf *class_name, const CharBuf *form, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 bool_t case_fold, bool_t strip_accent= s) { > =A0 =A0 =A0 =A0VTable *vtable =3D VTable_singleton(class_name, NULL); > =A0 =A0 =A0 =A0Normalizer *self =3D (Normalizer*)VTable_Make_Obj(vtable); > =A0 =A0 =A0 =A0return Normalizer_init(self, form, case_fold, strip_accent= s); > =A0 =A0} > > The actual code which *does* support subclassing for Normalizer is spread > across three functions, two of which I've included below my sig for refer= ence: > > =A0* XSBind_new_blank_obj() from perl/xs/XSBind.c, which wraps > =A0 =A0VTable_Make_Obj(). > =A0* XS_Lucy_Analysis_Normalizer_new() from Lucy.xs, which is auto-genera= ted. > =A0* Normalizer_init(), from core/Lucy/Analysis/Normalizer.c. > > In order to support dynamic subclassing in the Ruby bindings for Lucy, we= will > need to provide similar functionality. > > However, I question whether we need to provide that kind of functionality= for > Clownfish::CFC, which is itself written using a much cruder object model: > > =A0* No support for subclassing. > =A0* No support for serialization. > =A0* No support for Ruby's #clone or #dup methods. > > I don't yet understand why Ruby *needs* an allocator function if we aren'= t > going to use those bells and whistles. =A0How many C libraries out there = provide > two-stage constructors? =A0It doesn't make sense that Ruby would impose s= uch an > esoteric requirement, limiting the kinds of C libraries you could write R= uby > bindings for. > > Something like this ought to work: > > =A0 =A0// Clownfish::CFC::Hierarchy#new > =A0 =A0static VALUE > =A0 =A0S_CFCHierarchy_new(VALUE klass, VALUE source_rb, VALUE dest_rb) { > =A0 =A0 =A0 =A0const char *source =3D StringValuePtr(source_rb); > =A0 =A0 =A0 =A0const char *dest =A0 =3D StringValuePtr(dest_rb); > =A0 =A0 =A0 =A0CFCHierarchy *self =3D CFCHierarchy_new(source, dest); > =A0 =A0 =A0 =A0return Data_Wrap_Struct(klass, NULL, NULL, self); > =A0 =A0} > > =A0 =A0// Bootstrap Clownfish::CFC::Hierarchy. > =A0 =A0static void > =A0 =A0S_Init_CFCHierarchy() { > =A0 =A0 =A0 =A0cHierarchy =A0=3D rb_define_class_under(mCFC, "Hierarchy",= rb_cObject); > =A0 =A0 =A0 =A0rb_define_method(cHierarchy, "build", S_CFCHierarchy_build= , 0); > =A0 =A0 =A0 =A0rb_define_singleton_method(cHierarchy, "new", S_CFCHierarc= hy_new, 2); > =A0 =A0} > > =A0 =A0// Bootstrap Clownfish::CFC and all of its components. > =A0 =A0void > =A0 =A0Init_CFC() { > =A0 =A0 =A0 =A0mClownfish =3D rb_define_module("Clownfish"); > =A0 =A0 =A0 =A0mCFC =A0 =A0 =A0 =3D rb_define_module_under(mClownfish, "C= FC"); > =A0 =A0 =A0 =A0S_Init_CFCHierarchy(); > =A0 =A0} > > I don't know whether that's an idiomatic approach for writing a Ruby exte= nsion, > but if it works, it prevents us from having to add a bunch of CFCFoo_allo= cate() > functions and from having to provide two-stage constructors for every > Clownfish::CFC component. > > In any case, exploring this topic for the CFC bindings helps us to unders= tand > the issues we will confront when auto-generating Ruby wrapper code via th= e > as-yet-to-be-written Clownfish::CFC::Binding::Ruby. =A0:) > > Marvin Humphrey > > > cfish_Obj* > XSBind_new_blank_obj(SV *either_sv) { > =A0 =A0cfish_VTable *vtable; > > =A0 =A0// Get a VTable. > =A0 =A0if (sv_isobject(either_sv) > =A0 =A0 =A0 =A0&& sv_derived_from(either_sv, "Lucy::Object::Obj") > =A0 =A0 =A0 ) { > =A0 =A0 =A0 =A0// Use the supplied object's VTable. > =A0 =A0 =A0 =A0IV iv_ptr =3D SvIV(SvRV(either_sv)); > =A0 =A0 =A0 =A0cfish_Obj *self =3D INT2PTR(cfish_Obj*, iv_ptr); > =A0 =A0 =A0 =A0vtable =3D self->vtable; > =A0 =A0} > =A0 =A0else { > =A0 =A0 =A0 =A0// Use the supplied class name string to find a VTable. > =A0 =A0 =A0 =A0STRLEN len; > =A0 =A0 =A0 =A0char *ptr =3D SvPVutf8(either_sv, len); > =A0 =A0 =A0 =A0cfish_ZombieCharBuf *klass =3D CFISH_ZCB_WRAP_STR(ptr, len= ); > =A0 =A0 =A0 =A0vtable =3D cfish_VTable_singleton((cfish_CharBuf*)klass, N= ULL); > =A0 =A0} > > =A0 =A0// Use the VTable to allocate a new blank object of the right size= . > =A0 =A0return Cfish_VTable_Make_Obj(vtable); > } > > > XS(XS_Lucy_Analysis_Normalizer_new) { > =A0 =A0dXSARGS; > =A0 =A0CHY_UNUSED_VAR(cv); > =A0 =A0if (items < 1) { CFISH_THROW(CFISH_ERR, "Usage: %s(class_name, ...= )", =A0GvNAME(CvGV(cv))); } > =A0 =A0SP -=3D items; > > =A0 =A0const lucy_CharBuf* normalization_form =3D NULL; > =A0 =A0chy_bool_t case_fold =3D true; > =A0 =A0chy_bool_t strip_accents =3D false; > =A0 =A0chy_bool_t args_ok =3D XSBind_allot_params( > =A0 =A0 =A0 =A0&(ST(0)), 1, items, "Lucy::Analysis::Normalizer::new_PARAM= S", > =A0 =A0 =A0 =A0ALLOT_OBJ(&normalization_form, "normalization_form", 18, f= alse, LUCY_CHARBUF, alloca(cfish_ZCB_size())), > =A0 =A0 =A0 =A0ALLOT_BOOL(&case_fold, "case_fold", 9, false), > =A0 =A0 =A0 =A0ALLOT_BOOL(&strip_accents, "strip_accents", 13, false), > =A0 =A0 =A0 =A0NULL); > =A0 =A0if (!args_ok) { > =A0 =A0 =A0 =A0CFISH_RETHROW(CFISH_INCREF(cfish_Err_get_error())); > =A0 =A0} > =A0 =A0lucy_Normalizer* self =3D (lucy_Normalizer*)XSBind_new_blank_obj(S= T(0)); > > =A0 =A0lucy_Normalizer* retval =3D lucy_Normalizer_init(self, normalizati= on_form, case_fold, strip_accents); > =A0 =A0if (retval) { > =A0 =A0 =A0 =A0ST(0) =3D (SV*)Cfish_Obj_To_Host((cfish_Obj*)retval); > =A0 =A0 =A0 =A0Cfish_Obj_Dec_RefCount((cfish_Obj*)retval); > =A0 =A0} > =A0 =A0else { > =A0 =A0 =A0 =A0ST(0) =3D newSV(0); > =A0 =A0} > =A0 =A0sv_2mortal(ST(0)); > =A0 =A0XSRETURN(1); > } > > >