incubator-stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Sebor <se...@roguewave.com>
Subject Re: [PATCH] collate.cpp (was: RE: Localedef assertion failure on Windows)
Date Wed, 02 May 2007 15:43:56 GMT
This looks good. At first I was worried about the cost
of the additional map lookup but comparing the build
times of a couple of UTF-8 locales yielded no
appreciable difference so I think it's safe.

Thanks
Martin

Farid Zaripov wrote:
>  > -----Original Message-----
>  > From: Martin Sebor [mailto:sebor@roguewave.com]
>  > Sent: Tuesday, January 09, 2007 7:29 PM
>  > To: stdcxx-dev@incubator.apache.org
>  > Subject: Re: Localedef assertion failure on Windows
>  >
>  > Andrew Black wrote:
>  > > Greetings all.
>  > >
>  > > When building the UTF-8 locales on windows with the debug
>  > version of
>  > > the  localedef utility, the localedef utility terminates
>  > with a failed
>  > > assertion within the library (in __rw_debug_iter::operator*() in
>  > > _iterbase.h).  Within collate.cpp, the failure occurs on line 579.
>  > >
>  > > A trace of the code
>  >
>  > It might be helpful to see the stack trace.
>  >
>  > > indicates that the last good iteration across this line is
>  > iteration
>  > > number 56677, for the token 'UFFFD'.
>  >
>  > I assume this on line 23337 of UTF-8.
>  >
>  > > The following
>  > > token (<U00010300>) fails because
>  > __rw_debug_iter::_C_is_end() returns
>  > > true.  However, my reading of collate.cpp is that this condition
>  > > shouldn't happen, as the termination condition of loop
>  > containing the
>  > > statement in question is suppose to terminate when this
>  > condition is
>  > > reached.
>  > >
>  > > Does this indicate a flaw in std::map or something else?
>  >
>  > More likely, in collate.cpp or somewhere in the rest of localedef.
>  > I suspect it has to do with wchar_t being only 16 bits wide
>  > on Windows and the character map containing characters (such as
>  > <U00010300>) beyond that range. To fix this we'll either need
>  > to replace wchar_t with a 32-bit type or ignore characters
>  > that do not fit in 16 bits on Windows (and wherever else wchar_t isn't
>  > 32 bits, such as AIX).
> 
>  Today I have checked this problem.
> 
>  As I see when localedef processed the charmap file 
> (Charmap::process_chars()),
> the Charmap::add_to_cmaps() invoked for each character in CHARMAP section.
> Here the symbol name is added to the symnames_list_, but characted is 
> not added
> to the maps w_cmap_, rw_cmap_, mb_cmap_, rmb_cmap_ because of
> convert_to_wc() returns convert_to_ucs() which is returns false.
> 
>  Then in Def::add_missing_values() w_cmap.find() returns w_cmap.end(), 
> because of
> character were not inserted (see above). But this iterator dereferenced 
> without checking.
> 
>  The proposed patch is attached.
> 
>  Another thing is: why we first iterating through the 
> charmap_.get_symnames_list() and then
> searching the symbol in charmap_.get_w_cmap() instead of just iterating 
> through the
> charmap_.get_w_cmap()?
> 
> Farid.
> 


Mime
View raw message