Return-Path: Delivered-To: apmail-incubator-stdcxx-dev-archive@www.apache.org Received: (qmail 2469 invoked from network); 21 Aug 2007 15:31:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Aug 2007 15:31:45 -0000 Received: (qmail 98493 invoked by uid 500); 21 Aug 2007 15:31:42 -0000 Delivered-To: apmail-incubator-stdcxx-dev-archive@incubator.apache.org Received: (qmail 98474 invoked by uid 500); 21 Aug 2007 15:31:41 -0000 Mailing-List: contact stdcxx-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: stdcxx-dev@incubator.apache.org Delivered-To: mailing list stdcxx-dev@incubator.apache.org Received: (qmail 98463 invoked by uid 99); 21 Aug 2007 15:31:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Aug 2007 08:31:41 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.30.140.160] (HELO moroha.quovadx.com) (208.30.140.160) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Aug 2007 15:31:36 +0000 Received: from qxvcexch01.ad.quovadx.com ([192.168.170.59]) by moroha.quovadx.com (8.13.6/8.13.6) with ESMTP id l7LFVFQg015206 for ; Tue, 21 Aug 2007 15:31:15 GMT Received: from [10.70.3.113] ([10.70.3.113]) by qxvcexch01.ad.quovadx.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 21 Aug 2007 09:30:25 -0600 Message-ID: <46CB0542.5060109@roguewave.com> Date: Tue, 21 Aug 2007 09:31:14 -0600 From: Martin Sebor Organization: Rogue Wave Software, Inc. User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4 MIME-Version: 1.0 To: stdcxx-dev@incubator.apache.org Subject: Re: expectation vs requirements for locale facets References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-OriginalArrivalTime: 21 Aug 2007 15:30:25.0544 (UTC) FILETIME=[32240080:01C7E408] X-Virus-Checked: Checked by ClamAV on apache.org Travis Vitek wrote: >> Martin Sebor wrote: >> >> >> Yes. But notice the text doesn't say anything about time_put_byname or >> time_get_byname ;-) >> > > Well, the standard doesn't say much at all about the *_byname<> > facets. All it really says about them is > > [21.1.1.2 p4] For some standard facets a standard "..._byname" class, [...] The _byname requirements are extremely vague. Sometimes they are also implied by the requirements on the base facets, which makes them difficult to find. It's a mess. > > So, if I'm reading that right, the *_byname<> facet classes are just > there to prevent the user from having to instantiate a std::locale > directly. I'm not sure what you mean by this. The _byname facets are really just an implementation that's exposed in the interface if the locale library. They should have never been specified. > >> The C++ standard (or even the C standard for that >> matter) isn't going to of help here. > > Wait. Say what now? I'm not sure what you're trying to tell me here. > If the C++ Standard says that these facets read or write years as > roman numerals, then they should probably do so, regardless of what > any other standard document requires. I think this will actually get > cleared up in a few seconds... The C and C++ standards only specify the requirements on the "C" locale and leave the localized behavior unspecified. So pretty much anything goes. There are some ground rules but I suspect you won't be able to tease the requirement on swallowing leading space for the %e directive out of them. > >>> Of >>> course that isn't what I'm seeing. >> Test case? > > Yeah. See attachment. Only tested on Win32/VC8 and Linux/GCC. Thanks. Here are the results with stdcxx and with g++ 3.4.6: $ ./t.stdcxx | grep fail string=07/06/08 result=fail locale=thai string= 7.06.1908 result=fail locale=bg_BG string=07/06/08 result=fail locale=lo_LA string=07/06/08 result=fail locale=th_TH $ ./t.gcc | grep fail string=��� %.1d ��� 1908 result=fail locale=ar_SA string=۰۸/۰۶/۰۷ result=fail locale=fa_IR string=ಗುರುವಾರ 07 ಜೂ 1908 result=fail locale=kn_IN Looks like g++ is failing on multibyte character sequences but not on the spaces. We seem to somehow manage to process the multibyte sequences (I wonder how, or if it's a weakness in the test) but have issues with the leading space in bg_BG. I don't know what the problem is with the other locales... > >> It's hard to say from just looking at the code (and I haven't looked >> very carefully). In general, we [try to] to implement the POSIX >> semantics, so if it works with strptime()/strftime() it should work >> with our time_put_byname/ time_get_byname. >> > > Well, there's the problem right there. The standard requires that the > time_put<> facet format its output according to the POSIX function > strftime(), with the option for supporting extensions. It makes no > indication that the time_get<> facet should read data in such a way as > to be compatible with strptime(). The only thing I see that says > anything about the format expecte by time_get<> is here... [...] > Right. Pretty vague. > > This paragraph says that time_get<>::get_date() is supposed to process > the output of time_put<>::put(..., 'x'). > > [22.2.5.1.2 p4] Effects: Reads characters starting at s until it has > extracted those struct tm members, and remaining format characters, > used by time_put<>::put to produce the format specified by 'x' or > until it encounters an error. Yes. The problem with the C++ standard in this area is that the requirements a vague and not always implementable (e.g., the multibyte sequences -- all the narrow specializations of the _get facets operate on single characters). > >> If we test this behavior it's gotta be right ;-) Where does POSIX >> say leading spaces must be skipped? I see this under %e: Equivalent >> to %d. And under %d: The day of the month [01,31]; leading zeros >> are permitted but not required. Nothing about ignoring spaces. >> > > Absolutely. The docs for POSIX strftime()... [...] > So strftime() isn't even compatible with strptime() when it comes to '%e'. Hmm. That seems like a bug in POSIX then, unless we're missing something. You might want to create a POSIX-only test case to verify this and if I'm right open a discussion on the Austin Group list (http://www.opengroup.org/austin/lists.html). > [...] > Unfortunately, without consistent input/output it is going to be > difficult for this multi-threading test to verify that no data > corruption is occuring with arbitrary locales. Hopefully there is some > system in place that allows us to explicitly specify which locales are > to be used for a test. Not really. My approach would be to detect locales with this problem and avoid using them. The test also doesn't need to be exhaustive, at least not in this iteration. I think exercising just the most common patterns should be good enough (although %X is pretty common :) Martin