stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Sebor <>
Subject Re: low hanging fruit while cleaning up test failures
Date Thu, 03 Jan 2008 01:24:40 GMT
[forwarding back to the list]

Travis Vitek wrote:
 > Martin,
 > Not all supported platforms have the GB18030 encoding [HP, Compaq and
 > IRIX don't], and of those that do, they use different names [gb18030 vs
 > GB18030]. Same with UTF-8 [utf8, UTF8 or UTF-8].

I realize that. That's the reason why I mentioned the (currently quite
inefficient) find_mb_locale() in 22.locale.codecvt.out.cpp: it looks
for any multibyte locale with MB_CUR_MAX of some value.

 > On top of that, Windows
 > uses code page numbers instead of encodings. Now I can easily convert
 > names to uppercase and strip out non alphanumeric characters. I might
 > even be able to do something with windows. That isn't the problem.
 > The problem is that I don't seem to have a clear understanding of what
 > exactly you want from this. The original proposal was to be able to
 > filter locales by name or encoding, like so...
 >     char* locales = rw_locales (_RWSTD_LC_ALL, "en_US de", 0, true);
 >     // would retrieve C en_US and all German locales
 >     char* locales = rw_locales (_RWSTD_LC_ALL, 0, "UTF-8", true);
 >     // would retrieve all UTF-8 locales [those that end in .utf8, .UTF-8
 > or .UTF8]
 > So I wrote that. Unfortunately, as mentioned above, it has limitations.

Right. It wasn't a thought out proposal. I was just brainstorming :)
We may not be able to use any of it to fix the hanging tests.

 > What I really need is some actual requirements so that I can write some
 > code and get this bug closed.

My only requirement is to get those tests to pass in a reasonable
amount of time (i.e., without timing out), and without compromising
their effectiveness.

 > It seems that you want to guarantee that we test multibyte locales.

It seems important to exercise ctype::do_narrow() in this case but
I haven't looked at the code very carefully. It could be that the
code path in the multibyte case isn't any different from the single
byte case.

 > Do
 > we want to give up on the locale name matching, or do we want to include
 > zh_CN in the list of locales to test? What about matching the encoding?
 > Should we ignore all of this and just find one locale for each value of
 > MB_CUR_MAX from 1 to MB_LEN_MAX and run the test on them?

Maybe. I'll let you propose what makes the most sense to you :)


 > Travis
 >> -----Original Message-----
 >> From: Martin Sebor [] On Behalf Of Martin Sebor
 >> Sent: Wednesday, January 02, 2008 1:41 PM
 >> To:
 >> Subject: Re: low hanging fruit while cleaning up test failures
 >> Travis Vitek wrote:
 >>> Martin Sebor wrote:
 >>>> Travis Vitek wrote:
 >>>>> Martin Sebor wrote:
 >>>>>> Travis Vitek wrote:
 >>>>>>> Martin Sebor wrote:
 >>>>>>>> I added a new function, rw_fnmatch(), to the test
 >> driver. It behaves
 >>>>>>>> just
 >>>>>>>> like the POSIX fnmatch() (the FNM_XXX constants aren't
 >> implemented
 >>>>>>>> yet). While the main purpose behind the new function is
 >> to support
 >>>>>>>> STDCXX-683 it should make it easier to also implement a
 >> scheme like
 >>>>>>>> the one outlined below.
 >>>>>>>> Travis, feel free to experiment/prototype a solution :)
 >>>>>>>> Martin
 >>>>>>> What expression should be used to get an appropriate set
 >> of locales for
 >>>>>>> a
 >>>>>>> given platform? I can't really expect a filter for all
 >> UTF-8 locales to
 >>>>>>> work
 >>>>>>> on all platforms as some don't have those encodings
 >> available at all.
 >>>>>>> If
 >>>>>>> I
 >>>>>>> filter by language, then I may be limiting the testing
 >> to some always
 >>>>>>> correct subset. Is that acceptable for the MT tests?
 >>>>>> I think the MT ctype tests just need to exercise a representative
 >>>>>> sample of multi-byte encodings (i.e., MB_CUR_MAX between 1 and
 >>>>>> MB_LEN_MAX). There already is some code in the test suite to find
 >>>>>> locales that use these encodings, although it could be made more
 >>>>>> efficient. I don't know how useful rw_fnmatch() will turn out to
 >>>>>> be in finding these codesets since their names don't matter.
 >>>>>> Martin
 >>>>>>> Travis
 >>>>> Actually, I think I meant to say single threaded tests.
 >> Those are the
 >>>>> ones
 >>>>> that currently test every locale. The multi-threadede
 >> tests already test
 >>>>> a
 >>>>> subset of locales, though the method for selecting those
 >> locales may vary
 >>>>> between tests.
 >>>>> I don't think it is right to test a fixed set of locales based on
 >>>>> language,
 >>>>> country, or encoding. If you agree, then we probably agree that the
 >>>>> proposed
 >>>>> enhancement doesn't actually do anything useful [and I've
 >> wasted a bunch
 >>>>> of
 >>>>> time]. If this is the case, then we need to propose
 >> another solution for
 >>>>> selecting locales.
 >>>> I think testing a small subset of installed locales should
 >> be enough.
 >>>> In fact, for white box testing of the ctype facets, exercising three
 >>>> locales, "C" and two named ones, should be sufficient.
 >>>>> If I am wrong, and it is useful for testing [and more
 >> specifically how it
 >>>>> would be useful for fixing STDCXX-608], then I'd like to hear how.
 >>>> What do you propose?
 >>>> Martin
 >>> Okay. I can live with that. Then the issue now becomes deciding which
 >>> additional locales to test. How about just testing all
 >> Spanish and German
 >>> locales?
 >> I'd make sure at least one of them uses a multibyte encoding. Maybe
 >> zh_CN.GB18030? (with MB_CUR_MAX of 4)?
 >> Martin

View raw message