Return-Path: X-Original-To: apmail-lucy-dev-archive@www.apache.org Delivered-To: apmail-lucy-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 79FA7104DC for ; Tue, 30 Apr 2013 00:36:06 +0000 (UTC) Received: (qmail 75621 invoked by uid 500); 30 Apr 2013 00:36:06 -0000 Delivered-To: apmail-lucy-dev-archive@lucy.apache.org Received: (qmail 75531 invoked by uid 500); 30 Apr 2013 00:36:05 -0000 Mailing-List: contact dev-help@lucy.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucy.apache.org Delivered-To: mailing list dev@lucy.apache.org Received: (qmail 75508 invoked by uid 99); 30 Apr 2013 00:36:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Apr 2013 00:36:05 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [209.85.128.173] (HELO mail-ve0-f173.google.com) (209.85.128.173) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Apr 2013 00:36:00 +0000 Received: by mail-ve0-f173.google.com with SMTP id ox1so3471443veb.18 for ; Mon, 29 Apr 2013 17:35:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:x-originating-ip:in-reply-to:references :date:message-id:subject:from:to:content-type:x-gm-message-state; bh=+/0nb6atcNe+5vAlhuoECPJqYpEz1pDXbUYXsvRSDEo=; b=OsI3WLwxetrN/1S+JaTUWRmaKGaeUWenHhbYOF21tFxK5xb0fHJoQ9G+O+B0Ezscp1 +ozl0JuG70F6sL1eGVqP7NbzeUuAS1wazM6abWeAB9+MFXxUfpO5AlwI9YncB4amDR2t bpZ4MqwySXNvB12jV7yU5FgCPpt3Hc4b/SgVvWAcBGBuYtAIYTOO01CA9Ju/pH8Qwbai Te0sqHY3qUoiqnpSPY8fCFOTtlm94rI+Y3NJNUoBcyR02idvZHp0p/vRl46LmeBfNcdc o6fmrdEaL5gj4TuYO44QNy+RgFScA0f/8pL6sa8XWxHKa8mTGiMsJAfD6QSwK01CwDkQ 6PFA== MIME-Version: 1.0 X-Received: by 10.58.155.74 with SMTP id vu10mr26015589veb.27.1367282118434; Mon, 29 Apr 2013 17:35:18 -0700 (PDT) Received: by 10.58.55.233 with HTTP; Mon, 29 Apr 2013 17:35:18 -0700 (PDT) X-Originating-IP: [206.190.64.2] In-Reply-To: References: <04F42080-EB5B-417D-BD19-A97540DA4F55@aevum.de> <517A60A4.7040608@aevum.de> <8EAC9C54-7CEE-4FC4-BBE7-18FF9FA6DD88@aevum.de> Date: Mon, 29 Apr 2013 17:35:18 -0700 Message-ID: From: Marvin Humphrey To: dev@lucy.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQl5WiJSipm9IA0z4GUdFr2CZbDZLTstOe3Y8Xxaxa53IifvicDRGTA+TdkeOZuzDnlEAIDq X-Virus-Checked: Checked by ClamAV on apache.org Subject: Re: [lucy-dev] Proposal for implementation of immutable strings On Sat, Apr 27, 2013 at 7:27 AM, Nick Wellnhofer wrote: > Or RETAIN/RELEASE like in Objective-C? The colloquial meaning of the word "retain" has the same desirable property as that of "capture" -- unlike "incref" (increment reference count), it does not necessarily connote mutation of the object it's invoked upon. It's also true that `retain` in Objective-C returns an object pointer, like our current `INCREF`. However, the Objective-C behavior is not quite the same as the proposed behavior, in a subtle and important way. Invoking Objective-C `retain` in a void context may be valid; invoking the new behavior proposed for Clownfish in a void context would run the risk of leaking objects and eventually causing a memory error. Capturing the returned reference is **mandatory**. // Bad. CAPTURE(string); // leaks copy object if string is stack allocated VA_Push(array, (Obj*)string); // stores pointer to stack string (bad!) ... DECREF(array); // exception if you're lucky, segfault if you're not // Good. VA_Push(array, (Obj*)CAPTURE(string)); DECREF(array); // I don't know of any other reference counting mechanisms with such a behavior. For that reason, I have a mild preference for a novel name. A couple other tidbits about Objective-C: * `retain` and it's related methods are deprecated in favor of ARC (automatic reference counting). * An `NSString*` object may in fact be an instance of a mutable subclass. See for a rundown, which I think illustrates why we're doing the right thing by making String immutable and divorcing it from CharBuf in the inheritance hierarchy. One more implementation note: Clownfish-generated host bindings cannot wrap host values for parameters which are marked as `decremented`, e.g, such as the `element` parameter passed to VA_Push(): /** Push an item onto the end of a VArray. */ void Push(VArray *self, decremented Obj *element = NULL); The implementation copies the passed-in pointer directly, so it's up to the caller to manage reference counting: void VA_push(VArray *self, Obj *element) { if (self->size == self->cap) { VA_Grow(self, Memory_oversize(self->size + 1, sizeof(Obj*))); } self->elems[self->size] = element; // <-------------------- HERE self->size++; } > UTF-8 and UTF-16 are useful for filenames, so I'd say that any Clownfish > build should support these encodings. +1 for multiple subclasses of CharBuf supporting UTF-8 and UTF-16. +1 for supporting export from both CharBuf and String of NUL-terminated malloc'd character arrays in both UTF-8 and UTF-16 (native-endian). For String, though, it seems like one internal encoding matching the primary host encoding ought to suffice. > Using inheritance for encodings seems like a natural approach to me. Client > code wouldn't even have to care about which subclass it's working with. But > I don't have a strong opinion about how encodings are implemented > internally. The issues we're talking about are not just implementation details, though. The subclasses would be visible, constructors may have to be named differently, String can't be `final` (which affects optimizations like different method invocation code and potential inlining), and so on. Nevertheless, I agree that these interface decisions are minor in comparison to the decision to introduce an immutable String class in the first place. Marvin Humphrey