lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-dev] OFFSET globals
Date Wed, 02 May 2012 21:56:26 GMT
On Sat, Apr 28, 2012 at 11:28 AM, Nathan Kurz <nate@verse.com> wrote:
> On Thu, Apr 26, 2012 at 6:23 PM, Marvin Humphrey <marvin@rectangular.com> wrote:
>> If that were _PERL, we would eliminate Host_callback() indirection layer and
>> flesh out the function body with raw XS:
>
> +1.  I like that it removes a layer of indirection, and makes it
> clearer what the function does.

Sounds good -- I'll open an issue.

> And I might be overplanning, but I think there are cases where it might be
> nice to try to bridge between to host languages, Host language specific
> naming would allow for this.

There are other obstacles:

  * What should Obj#To_Host() return when there are multiple hosts?
  * What about classes that use host language data structures directly, like
    Lucy::Document::Doc?
  * Where do we put the cached host object that makes it possible to support
    inside-out member variables under Perl?

The fundamental challenge is that the more tightly we integrate with the host
language, the harder it becomes to bridge between multiple hosts.

> But what I'd like to move towards is a policy of "if you can grab hold of
> it, you can subclass it", with the final or non-public classes being static
> or otherwise opaque in the object file.  This doesn't need to be absolute,
> and convention will still play a role, but reduced visibility equals reduced
> temptation.

I'm with you on reduced visibility for non-public classes -- that's just
best-practice information hiding.

Final classes are different, though.  They enable certain compiler
optimizations, e.g. inlining, because final methods can often be resolved to
specific functions, avoiding vtable dispatch (which is why virtual methods
are not the default in C++).

> But I do want to make sure that Core->Compiled->Script hierarchies
> works identically to Core->Script.

+1 -- That's how it works now (parcel prefix bugs notwithstanding) -- so for
instance we can subclass Nick's test extension
LucyX::Analysis::WhitespaceTokenizer as "MyWhitespaceTokenizer" and it will
behave as expected.

> And certainly we want Core->Script->Script to work within the same host
> language.

+1 -- Also 100% supported.

> Do we care about Core->ScriptA->ScriptB:  probably not, although I maybe if
> ScriptA was something embedded like Lua.  But I see no harm in aiming
> for this capability as a polygot point of pride.

I feel the attraction of this feature too, but to be honest, I think it's
deceptively difficult to implement.  Object life cycle issues such as
destructors are a real headache already without having to coordinate with
*multiple* external garbage collection systems.  And who wants to write the
tests to demonstrate that every last feature works under
Core->ScriptA->ScriptB?

In my opinion, we need to leave this item off of the requirements list instead
specify that Clownfish need only target one host language at a time.

> [...] Currently, it looks like it's possible for a Clownfish object to
> create a per-instance copy of the registered per-class VTable using
> VTable_singleton.
>
> Within Clownfish this copying is as part of the process of creating a
> subclass, but I don't see any technical reasons that an object couldn't
> create this copy, modify it, use it, and never register it.  Multiple
> objects could even share this "private" and unregistered VTable, or others
> could copy and modify it creating an ad-hoc prototype system independent of
> the official class hierarchy.
>
> I'm not advocating for this, but wondered if this was a valuable capability
> that should be preserved or just a spandrel.

It's a spandrel[1].  In other words, it's an accident of the implementation
that per-instance VTables look doable.

I'd oppose supporting per-instance methods as a Clownfish feature.  They have
their place in Ruby (and JavaScript, etc), but they would be difficult to
support properly in other languages -- notably Perl.

One of the defining characteristics of the Clownfish undertaking is that we
do NOT need to innovate in the OO space.  We only need to support a vanilla
classical inheritance model -- and the ambitious part of our task is to
integrate that model into diverse environments and to provide idiomatic APIs
which make using Clownfish-based software from the host language feel as
natural as possible.

The only OO feature I think Clownfish needs to add is some sort of weak
multiple inheritance along the lines of mixins or interface inheritance.  Raw
single inheritance was good enough to build Lucy, but it's going to be hard to
live with such a limitation once Clownfish gains a public API.

> What I will be advocating for is a window of malleability before the
> classes are registered.  I'm with you regarding immutability once a
> class is "visible", but I'd like the host language to have a shot at
> things before the core classes are declared visible.
> I also want to make sure we preserve the current unadvertised ability to
> override the core methods with compiled "overlay" extensions using
> LD_PRELOAD.  But this is a secret I'm waiting to spring on you at some later
> point when have a better feel for the whole situation.

Hmm... as you probably anticipated, I have certain reservations about
officially sanctioning a technique which violates encapsulation.  :)

But one quirk of this proposal is that it can only be achieved at the level of
the application, since libraries don't get to control LD_PRELOAD.  That
doesn't make it any less risky to reach in and monkey patch a core class, but
it *does* mean that the monkey who monkey patches is going to be the *same*
monkey who gets bitten to death by the spooky action-at-a-distance bugs their
tinkering unleashed.

I'm still not enthusiastic because I don't think it's in our interest to
forswear the compiler optimizations that hiding those function symbols
enables.

>> Even if you don't override any methods, the VTable stores other metadata,
>> like the class name.  (Which is why "VTable" should be renamed to
>>"MetaClass" -- it's not just an array of function pointers any more.)
>
> +0.  I see your logic, but I'm not a fan of "MetaClass" as a name.
> VTable might be technically inaccurate, but it sets up the right
> expectations.

I've come to believe that it's important that we ditch the name "VTable",
which is an accident of the historical implementation -- originally, it *was*
just an array of function pointers.  Very few of our users are going to have
any idea what on earth a "vtable" is; "MetaClass" is easier to grok.  And if
we change the implementation to use multiple vtables, "VTable" becomes even
less accurate.

The other obvious candidate is "Class".  Personally, I like that a little less
than "MetaClass" because it's an overloaded term and it means we have to spell
instances "klass" to avoid colliding with the C++ keyword, but it's apparently
good enough for both Ruby and Java.

>> Thanks for starting up a requirements list, as recommended in that
>> Joshua Bloch presentation.
>
> Now I'm worried that you read every link I send you in full, and that
> if I were to accidentally send the wrong long thing we could lose you
> for weeks! I'll try to make sure to aim for high quality at least. :)

Haha, I'd seen that Bloch presentation before -- it's really good!

It would be a great exercise to apply Bloch's critiques to the classes that
end up in the Clownfish core -- especially since a lot of that presentation
deals with Java platform APIs which bear a strong resemblance to Clownfish.

Marvin Humphrey

[1] For those of you like me who weren't familiar with the term "spandrel",
    here's Wikipedia: http://en.wikipedia.org/wiki/Spandrel_(biology)
    "In evolutionary biology, a Spandrel is a phenotypic characteristic that
    is a byproduct of the evolution of some other characteristic, rather than
    a direct product of adaptive selection."

Mime
View raw message