lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-dev] Proposal for implementation of immutable strings
Date Thu, 25 Apr 2013 23:48:55 GMT
On Thu, Apr 25, 2013 at 9:00 AM, Nick Wellnhofer <wellnhofer@aevum.de> wrote:
> So here's what I came up with.

The general approach looks great!  The `origin` member var seems like an
elegant solution to the shared-buffer problem. :)

>     * For substrings of zombie strings, another unique value can be
>       used.

Substrings of zombie strings are dangerous, because the buffer belonging to
the parent object may not outlive the substring.

User-defined procedures will encounter ZombieStrings via wrapped callbacks --
if a parameter is `String*` they'll get a real String with copied content from
host argument, but if it's `const String*`, they'll get a ZombieString*
wrapping the host string content.  Unless we want to require that `SubString`
operate on non-const String* (like we will for `Inc_RefCount`),
ZStr_SubString() will have to return a fully independent String object which
owns its own buffer.

> For zombie strings, it's assumed that they don't have to care about the
> lifetime of the character buffer. So there are two cases left out:

> stack-allocated strings that own a buffer

Can we make that an invalid state and avoid it?

> and stack-allocated substrings of
> a normal string.

This subtle trap illustrates why we shouldn't expose ZombieString as a public
API.

IMO, we don't need to DECREF -- we should just allow the stack-allocated
substring to vanish.

If you're careful, you can guarantee that the real string outlives the
stack-allocated string.  That's effectively what we're doing when wrapping
host string arguments.

> Then another unrelated question turned up. Originally, I planned to make
> Clownfish::String abstract, and implement different encodings in
> Clownfish::UTF8String, etc. But it's also possible to implement the UTF-8
> encoding directly in Clownfish::String. This might make sense because UTF-8
> will be used in all but a few cases.

I think that's a good approach, but I have an ulterior motive -- I'm hoping
that ultimately we end up with one class handling all encodings, a la
<http://www.python.org/dev/peps/pep-0393/>.

In any case, it seems like an implementation decision we'll have the freedom
to change later as appropriate.

PS: Is it now true that ZombieStrings can only ever be allocated on the stack,
    rather than in static memory?  Because if that's the case, I'd favor the
    name StackString instead.

Marvin Humphrey

Mime
View raw message