lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Wellnhofer <>
Subject Re: [lucy-dev] Proposal for implementation of immutable strings
Date Fri, 26 Apr 2013 11:10:28 GMT
On 26/04/2013 01:48, Marvin Humphrey wrote:
> Substrings of zombie strings are dangerous, because the buffer belonging to
> the parent object may not outlive the substring.

Right. This even applies to zombie strings alone. Consider assigning a 
zombie string to a member var. This is currently done with

     self->var = CB_Clone(zstr);

With immutable strings, we don't have to create a copy and can simply write

     self->var = (String*)INCREF(zstr);

This will break with zombie strings.

> User-defined procedures will encounter ZombieStrings via wrapped callbacks --
> if a parameter is `String*` they'll get a real String with copied content from
> host argument, but if it's `const String*`, they'll get a ZombieString*
> wrapping the host string content.

That's a great solution. But this isn't implemented yet, right? It would 
also require that String methods can be invoked on const Strings (like 
const member functions in C++). Would this work without further changes?

> Unless we want to require that `SubString`
> operate on non-const String* (like we will for `Inc_RefCount`),
> ZStr_SubString() will have to return a fully independent String object which
> owns its own buffer.

That shouldn't be a problem. BTW, the INCREF macro should be changed so 
it doesn't work with const objects, see example above.

>> For zombie strings, it's assumed that they don't have to care about the
>> lifetime of the character buffer. So there are two cases left out:
>> stack-allocated strings that own a buffer
> Can we make that an invalid state and avoid it?

Yes, we'll simply make the assumption that zombie strings never own a 

> I think that's a good approach, but I have an ulterior motive -- I'm hoping
> that ultimately we end up with one class handling all encodings, a la
> <>.

That would only require a member var to store the encoding. But I don't 
quite understand the rationale behind this. Does it have to do with the 
Python bindings?

> PS: Is it now true that ZombieStrings can only ever be allocated on the stack,
>      rather than in static memory?  Because if that's the case, I'd favor the
>      name StackString instead.

In ZombieKeyedHash, they're allocated from a MemPool. Otherwise, all 
allocations seem to be from the stack.


View raw message