> Correct, utf7/8 are otherwise escaped.

It's stricter than that, at least for UTF 8,16 and 32 (I haven't checked 7) -- they don't use values < 0x80 at all except when representing characters which are the same in 7-bit ASCII.  This means, given any of the encodings { ASCII, ISO-8859-x, UTF-{8/16/32} } you can safely memchr(buf, '/', size) and rely on the result without back-tracking.

FWIW, all those encodings also have the nice property that you can find the number of bytes of encoding used for any character by examining only the first byte of the character. That property is helpful, for example, when writing lexers.

You're right that shift-JIS in particular needs attention paid to it.  Locally, I try very hard not to support any non-unicode character set, but I understand that's a luxury that APR does not have.


Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102