lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-dev] Proposal for implementation of immutable strings
Date Tue, 30 Apr 2013 00:35:18 GMT
On Sat, Apr 27, 2013 at 7:27 AM, Nick Wellnhofer <wellnhofer@aevum.de> wrote:
> Or RETAIN/RELEASE like in Objective-C?

The colloquial meaning of the word "retain" has the same desirable property
as that of "capture" -- unlike "incref" (increment reference count), it does
not necessarily connote mutation of the object it's invoked upon.

It's also true that `retain` in Objective-C returns an object pointer, like
our current `INCREF`.

However, the Objective-C behavior is not quite the same as the proposed
behavior, in a subtle and important way.  Invoking Objective-C `retain` in a
void context may be valid; invoking the new behavior proposed for Clownfish in
a void context would run the risk of leaking objects and eventually causing a
memory error.  Capturing the returned reference is **mandatory**.

    // Bad.
    CAPTURE(string);        // leaks copy object if string is stack allocated
    VA_Push(array, (Obj*)string);    // stores pointer to stack string (bad!)
    ...
    DECREF(array);       // exception if you're lucky, segfault if you're not

    // Good.
    VA_Push(array, (Obj*)CAPTURE(string));
    DECREF(array); //

I don't know of any other reference counting mechanisms with such a behavior.
For that reason, I have a mild preference for a novel name.

A couple other tidbits about Objective-C:

*   `retain` and it's related methods are deprecated in favor of ARC
    (automatic reference counting).
*   An `NSString*` object may in fact be an instance of a mutable subclass.
    See <http://blog.bignerdranch.com/803-about-mutability/> for a rundown,
    which I think illustrates why we're doing the right thing by making
    String immutable and divorcing it from CharBuf in the inheritance
    hierarchy.

One more implementation note: Clownfish-generated host bindings cannot wrap
host values for parameters which are marked as `decremented`, e.g, such as the
`element` parameter passed to VA_Push():

    /** Push an item onto the end of a VArray.
     */
    void
    Push(VArray *self, decremented Obj *element = NULL);

The implementation copies the passed-in pointer directly, so it's up to the
caller to manage reference counting:

    void
    VA_push(VArray *self, Obj *element) {
        if (self->size == self->cap) {
            VA_Grow(self, Memory_oversize(self->size + 1, sizeof(Obj*)));
        }
        self->elems[self->size] = element; // <-------------------- HERE
        self->size++;
    }

> UTF-8 and UTF-16 are useful for filenames, so I'd say that any Clownfish
> build should support these encodings.

+1 for multiple subclasses of CharBuf supporting UTF-8 and UTF-16.

+1 for supporting export from both CharBuf and String of NUL-terminated
malloc'd character arrays in both UTF-8 and UTF-16 (native-endian).

For String, though, it seems like one internal encoding matching the primary
host encoding ought to suffice.

> Using inheritance for encodings seems like a natural approach to me. Client
> code wouldn't even have to care about which subclass it's working with. But
> I don't have a strong opinion about how encodings are implemented
> internally.

The issues we're talking about are not just implementation details, though.
The subclasses would be visible, constructors may have to be named
differently, String can't be `final` (which affects optimizations like
different method invocation code and potential inlining), and so on.

Nevertheless, I agree that these interface decisions are minor in comparison
to the decision to introduce an immutable String class in the first place.

Marvin Humphrey

Mime
View raw message