stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Lemings" <Eric.Lemi...@roguewave.com>
Subject RE: STDCXX-435
Date Wed, 12 Mar 2008 15:37:10 GMT
 

> -----Original Message-----
> From: Martin Sebor [mailto:msebor@gmail.com] On Behalf Of Martin Sebor
> Sent: Tuesday, March 11, 2008 5:42 PM
> To: dev@stdcxx.apache.org
> Cc: Eric Lemings; Martin Sebor
> Subject: Re: STDCXX-435
> 
> I don't think it's that simple. IIRC, the problem is that we're using
> mbsrtowcs() on sequences that aren't guaranteed to be NUL-terminated.

Well the sequences in the test case are null terminated.

According to the C99 specs for mbsrtowcs(), "Conversion continues up to
and including a terminating null character, which is also stored.
Conversion stops earlier in two cases: when a sequence of bytes is
encountered  that does not form a valid multibyte character, or (if
dst is not a null pointer) when len wide characers have been stored in
the array pointed to by dst."

So the function actually stops conversion in three cases: 1.) it
converts
the terminating null character, 2.) attempts to convert an invalid
multibyte character, and 3.) converts len number of wide characters.

The problem in the test case is that the src constitutes only one
character
(even though it contains more valid characters) but the function is
called
with a len argument of 2 -- the length of the destination buffer.

> It seems that it should be straightforward to either specify a limit
> to mbsrtowcs() that's small enough so as to prevent the function from
> ever reaching the end of the (non-NUL-terminated) source sequence,

Right.  And I believe that limit is

	min(__from_end-__from, __to_limit-__to)

possibly adjusted to account for the length in bytes of the appropriate
character sequence type, e.g. mbrlen(), wcslen().

> or make sure there is a NUL at the end (by copying the short
subsequence
> at the end of the source sequence into a small local buffer and NUL
> terminating it there.

Also possible.  It would be safer though more involved.

> I recall trying to get that approach to work
> at first and failing. I don't remember why it didn't work anymore,
> if it was because the code became too complex and inefficient or
> if it simply wasn't possible to guarantee that it would be correct
> in all cases.
> 
> In any event, I came to the conclusion that we can't call mbsrtowcs()
> to convert whole ranges of characters at once but that we must do the
> conversion one character at a time instead. That's what the attached
> patch does (I think). Since I wrote it months ago I don't remember
> how extensively I tested it, or if it even applies cleanly after so
> much time has passed. It does appear to fix the bug in the originally
> submitted test case though :)

I'll do some manual testing with this patch -- give it a test drive.

Brad.

Mime
View raw message