lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject [lucy-dev] Re: Initial sketch of string iterators
Date Sat, 13 Apr 2013 02:01:02 GMT
On Fri, Apr 12, 2013 at 12:17 PM,  <nwellnhof@apache.org> wrote:
> Initial sketch of string iterators

> Project: http://git-wip-us.apache.org/repos/asf/lucy/repo
> Commit: http://git-wip-us.apache.org/repos/asf/lucy/commit/ad09b28f
> Tree: http://git-wip-us.apache.org/repos/asf/lucy/tree/ad09b28f
> Diff: http://git-wip-us.apache.org/repos/asf/lucy/diff/ad09b28f

Very nice!

*   The public API for CharBufIterator hierarchy looks great!
*   UTF8Iterator looks great.  I can imagine exactly what a UTF16Iterator
    would look like and how string comparisons would work when internal
    encodings differ.
*   The next patch, "Convert CB_Trim_{Top|Tail} to use iterators", looks
    perfect as well.  No need to implement in concrete subclasses, I see. :)
*   Should CharBufIterator#Next return 0 when it reaches the end of a string
    instead of throwing an exception?  That would make it possible to iterate
    through text values which are not expected to contain NUL without having
    to call Has_Next() before each call to Next().
*   I have mixed feelings about the Zombie* classes :) but so long as they are
    limited to parcel visibility, it's all good.
*   I can imagine some possibilities for optimizing the internals, but those
    are implementation details (which I imagine you left out on purpose).

It seems to me that the factory methods Top() and Tail() should be public,
while ZTop() and ZTail() should have parcel exposure.  More generally, I've
moved towards the position that while we want to have the core allocate
objects on the stack for certain internal tasks, it would be unwise at this
point for Clownfish to support stack object allocation as a public API.
I'm actually hoping that as we overhaul Clownfish string handling that
ZombieCharBuf can go away.  But what you've presented makes sense and fits
perfectly within the current context.

> +uint32_t
> +UTF8Iter_next(UTF8Iterator *self) {
> +    CharBuf *char_buf = self->char_buf;
> +    if (self->byte_offset >= char_buf->size) {
> +        THROW(ERR, "Iteration past end of string");
> +    }
> +    const char *ptr        = char_buf->ptr + self->byte_offset;
> +    uint32_t    code_point = StrHelp_decode_utf8_char(ptr);
> +    self->byte_offset += StrHelp_UTF8_COUNT[*(uint8_t*)ptr];
> +    return code_point;
> +}

How about something like this, which I think should have the same cost?

-    if (self->byte_offset >= char_buf->size) {
-        THROW(ERR, "Iteration past end of string");
-    }
+    if (self->byte_offset > char_buf->size) {
+        THROW(ERR, "Iteration past end of string");
+    }
+    else if (self->byte_offset == char_buf->size) {
+        return 0;
+    }

> +ZombieUTF8Iterator*
> +ZUTF8Iter_new(void *allocation, CharBuf *char_buf, size_t byte_offset) {
> +    ZombieUTF8Iterator *self = (ZombieUTF8Iterator*)allocation;
> +    self->ref.count   = 1;

Might want to go through VTable_Init_Obj(), no?  ;)

Marvin Humphrey

Mime
View raw message