stdcxx-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Stdcxx Wiki] Update of "LocaleLookup" by TravisVitek
Date Thu, 27 Mar 2008 05:00:39 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Stdcxx Wiki" for change notification.

The following page has been changed by TravisVitek:
http://wiki.apache.org/stdcxx/LocaleLookup

------------------------------------------------------------------------------
  
  || Test || Criteria ||
  || 22.LOCALE.CODECVT.MT.CPP || *1,+ ||
- || 22.LOCALE.CODECVT.OUT.CPP || *2 ||
+ ||<rowstyle="color:green"> 22.LOCALE.CODECVT.OUT.CPP || *10 ||
  || 22.LOCALE.CONS.MT.CPP || *1,+ ||
  || 22.LOCALE.CTYPE.CPP || *2 ||
  || 22.LOCALE.CTYPE.IS.CPP || *2 ||
@@ -34, +34 @@

  || 22.LOCALE.MONEY.PUT.MT.CPP || *1,+ ||
  || 22.LOCALE.MONEYPUNCT.CPP || *4 ||
  || 22.LOCALE.MONEYPUNCT.MT.CPP || *1,+ ||
- || 22.LOCALE.NUM.GET.CPP || *9 ||
+ ||<rowstyle="color:red"> 22.LOCALE.NUM.GET.CPP || *9 ||
  || 22.LOCALE.NUM.GET.MT.CPP || *1,+ ||
- || 22.LOCALE.NUM.PUT.CPP || *9 ||
+ ||<rowstyle="color:red"> 22.LOCALE.NUM.PUT.CPP || *9 ||
  || 22.LOCALE.NUM.PUT.MT.CPP || *1,+ ||
  || 22.LOCALE.NUMPUNCT.MT.CPP || *1,+ ||
  || 22.LOCALE.STATICS.MT.CPP || *4,+ ||
- || 22.LOCALE.TIME.GET.CPP || *5,6 ||
+ ||<rowstyle="color:green"> 22.LOCALE.TIME.GET.CPP || *5,6 ||
  || 22.LOCALE.TIME.GET.MT.CPP || *1,+ ||
  || 22.LOCALE.TIME.PUT.MT.CPP || *1,+ ||
  
- * Any locale for which setlocale (LC_ALL, name) will succeed.
+  1. Any locale for which setlocale (LC_ALL, name) will succeed.
- * Any locale for which setlocale (LC_CTYPE, name) will succeed.
+  1. Any locale for which setlocale (LC_CTYPE, name) will succeed.
- * Any locale for which setlocale (LC_NUMERIC, name) will succeed.
+  1. Any locale for which setlocale (LC_NUMERIC, name) will succeed.
- * All installed locales.
+  1. All installed locales.
- * First locale matching a specific name.
+  1. First locale matching a specific name.
- * First locale matching a regular expression.
+  1. First locale matching a regular expression.
- * First locale that is not an alias for the C/POSIX locale.
+  1. First locale that is not an alias for the C/POSIX locale.
- * Any locale for which setlocale (LC_ALL, name) will succeed, list includes C/POSIX locale.
+  1. Any locale for which setlocale (LC_ALL, name) will succeed, list includes C/POSIX locale.
- * Any locale for which setlocale (LC_NUMERIC, name) will succeed and decimal_point is not
'.'
+  1. Any locale for which setlocale (LC_NUMERIC, name) will succeed and decimal_point is
not '.'
+  1. Locale with largest MB_CUR_LEN value.
  + Test limits the number of locales tested.
  
  ||<rowstyle="color:red">Note: Most of the MT tests limit the number of locales to
32, so the test failure is not a matter of running against to many locales, it is an issue
of running to many iterations per thread. The 'solution' discussed in this document doesn't
seem to address the actual problem for these tests.||
+ ||<rowstyle="color:red">Note: Most of the tests simply run against all locales that
have a specified category. We need to decide how to further reduce the number of locales tested.||
  
  [[Anchor(Definitions)]]
  = Definitions =
@@ -71, +73 @@

  
  This page relates to the issue described in [http://issues.apache.org/jira/browse/STDCXX-608
STDCXX-608]. There has been some discussion both on and off the dev@ list about how to proceed.
This page is here to document what has been discussed.
  
- The plan to meet the [#Objective Objective] is to provide an interface to query the set
of installed locales based on a set of a small number of essential parameters used by the
localization tests. The interface should make it easy to express conjunction, disjunction,
and negation of the terms (parameters) and support (a perhaps simplified version of) [http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03
Basic Regular Expression] syntax. We've decided to use shell brace expansion as a means of
expressing logical conjunction between terms: a valid brace expression is expanded to obtain
a set of terms implicitly connected by a logical AND. Individual ('\n'-separated) lines of
the query string are taken to be implicitly connected by a logical OR. This approach models
the [http://www.opengroup.org/onlinepubs/009695399/utilities/grep.html grep] interface with
each line loosely corresponding to the argument of the `-e` option to `grep`.
+ The plan to meet the [#Objective Objective] is to provide an interface to query the set
of installed locales based on a set of a small number of essential parameters used by the
localization tests. The interface should make it easy to express conjunction, disjunction,
and negation of the terms (parameters) and support (a perhaps simplified version of) [http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03
Basic Regular Expression] syntax. We've decided to use shell brace expansion as a means of
expressing logical conjunction between terms: a valid brace expression is expanded to obtain
a set of terms implicitly connected by a logical AND. Individual ('\n'-separated) lines of
the query string are taken to be implicitly connected by a logical OR. This approach models
the [http://www.opengroup.org/onlinepubs/009695399/utilities/grep.html grep] interface with
each line loosely corresponding to the argument of the {{{-e}}} option to {{{grep}}}.
  
  [[Anchor(Part1)]]
  = Part 1 (STDCXX-714) =
  
- The first thing that we needed was to write the function for doing Basic Regular Expression
name matching and add it to the test suite.. Martin has already added an implementation of
[http://svn.apache.org/viewvc/stdcxx/trunk/tests/src/fnmatch.cpp rw_fnmatch](), so that is
done. `rw_fnmatch()` is a simplified implementation of the POSIX [http://www.opengroup.org/onlinepubs/009695399/functions/fnmatch.html
fnmatch] function which supports a simplified and modified form of BRE used in filename globbing.
This is sufficient for what we need in term of regular expression support.
+ The first thing that we needed was to write the function for doing Basic Regular Expression
name matching and add it to the test suite.. Martin has already added an implementation of
[http://svn.apache.org/viewvc/stdcxx/trunk/tests/src/fnmatch.cpp rw_fnmatch](), so that is
done. {{{rw_fnmatch()}}} is a simplified implementation of the POSIX [http://www.opengroup.org/onlinepubs/009695399/functions/fnmatch.html
fnmatch]() function which supports a simplified and modified form of BRE used in filename
globbing. This is sufficient for what we need in term of regular expression support.
  
  The second thing that we needed was a function to do brace expansion. After much discussion,
it was decided that the csh brace expansion rules made the most sense. Travis provided an
implementation of a function for doing brace expansion. The function [http://svn.apache.org/viewvc/stdcxx/trunk/tests/src/braceexp.cpp
rw_shell_expand]() does whitespace tokenization and collapse, and then does brace expansion
on each token, much like the behavior you would see from the csh shell.
  
  Just for illustration, consider the following string.
  
  {{{
-    a-{1,2}-b
+ a-{1,2}-b
  }}}
  
- If you passed this to `rw_shell_expand()` (with ' ' as the seperator), the result would
be
+ If you passed this to {{{rw_shell_expand()}}} (with ' ' as the seperator), the result would
be
  
  {{{
-    a-1-b a-2-b
+ a-1-b a-2-b
  }}}
  
  [[Anchor(Part2)]]
@@ -102, +104 @@

  The format of these files is simple. Here is a grammar
  
  {{{
-   native-name-list ::= <native-name> | <native-name> ',' <native-name-list>
| '\n' <ws> <native-name-list>
+ native-name-list ::= <native-name> | <native-name> ',' <native-name-list>
| '\n' <ws> <native-name-list>
-   line         ::= '#' <comment> | <canonical-name> <native-name-list>
+ line         ::= '#' <comment> | <canonical-name> <native-name-list>
-   line-list    ::= <line> | <line> '\n' <line-list> 
+ line-list    ::= <line> | <line> '\n' <line-list> 
  }}}
  
  The grammar is comma delimited, so the strings are not to be quoted. Here is an example
to illustrate.
  
  {{{
-   # this is a comment line
+ # this is a comment line
  
-    # _not_ a comment line
+  # _not_ a comment line
-   # the above maps '_not_ a comment line' to the value '#'
+ # the above maps '_not_ a comment line' to the value '#'
  
-   # map 'English' to 'en'
+ # map 'English' to 'en'
-   en	English
+ en	English
  
-   # map 'Albanian', 'alb' and 'sqi' to 'sq'
+ # map 'Albanian', 'alb' and 'sqi' to 'sq'
-   sq    Albanian, alb, sqi
+ sq    Albanian, alb, sqi
  
-   # similar to above, except that mapping is multiline
+ # similar to above, except that mapping is multiline
-   cu    Church Slavic, Old Slavonic, Church Slavonic,
+ cu    Church Slavic, Old Slavonic, Church Slavonic,
-         Old Bulgarian, Old Church Slavonic, chu
+       Old Bulgarian, Old Church Slavonic, chu
  }}}
  
  [[Anchor(Part3)]]
@@ -132, +134 @@

  The proposed interface to all of this is a single public function named rw_query_locales().
The signature would be...
  
  {{{
-   char* rw_query_locales (int loc_cat, const char* query, size_t count);
+ char* rw_query_locales (int loc_cat, const char* query, size_t count);
  }}}
  
  The {{{loc_cat}}} parameter is the locale category to get locales for, just like `rw_locales()`
does in its current implementation. The {{{query}}} parameter will be the query string. The
{{{count}}} parameter is the maximum number of locales to return. This allows you to easily
limit the number of locales returned and eventually tested.
@@ -140, +142 @@

  The proposed grammar used by the query string is similar to what is used for the xfail.txt
{{{config}}} string. It is a shell globbed string that has its terms joined with dashes.
  
  {{{
-   <match> is a shell globbing pattern in the format below. All fields 
+ <match> is a shell globbing pattern in the format below. All fields are required.

-   are required. 
  
-   iso-country  ::= ISO-639-1 or ISO-639-2 two or three character country code 
+ iso-country  ::= ISO-639-1 or ISO-639-2 two or three character country code 
-   iso-language ::= ISO-3166 two character language code 
+ iso-language ::= ISO-3166 two character language code 
-   iana-codeset ::= IANA codeset name with '-' replaced or removed 
+ iana-codeset ::= IANA codeset name
  
-   match        ::= <iso-language-expr> '-' <iso-country-expr> '-' <mb_cur_len-expr>
'-' <iana-codeset-expr>
+ match        ::= <iso-language-expr> '-' <iso-country-expr> '-' <mb_cur_len-expr>
'-' <iana-codeset-expr>
-   match_list   ::= match | match ' ' match_list 
+ match_list   ::= match | match ' ' match_list 
  }}}
  
  So, given a query string 
  
  {{{
-   *-{CA,US}-1-{ISO-8859-1,UTF-8}
+ *-{CA,US}-1-{ISO-8859-1,UTF-8}
  }}}
  
  this function would internally apply brace expansion to get the following list of expressions
  
  {{{
-   *-CA-1-*-ISO-8859-1 *-CA-1-*-UTF-8 *-US-1-*-ISO-8859-1 *-US-1-*-UTF-8
+ *-CA-1-*-ISO-8859-1 *-CA-1-*-UTF-8 *-US-1-*-ISO-8859-1 *-US-1-*-UTF-8
  }}}
  
  ||<rowstyle="color:red"> /!\ Notice that I have moved the codeset to be the last match
in the query string. That is because the codeset string is allowed to contain dashes. This
was done to avoid issues with accidentally mistaking dashes in the codeset name with dashes
in the grammar.||
@@ -169, +170 @@

  
  ||<rowstyle="color:red"> /!\ Perhaps we should consider adding an additional parameter
to prepend the C/POSIX locales as there is no way to match them using the canonical locale
name matching rules we've laid out above.||
  
- The buffer returned by `rw_locale_query()` is owned by that function and is not to be dallocated
by the user. This buffer is currently planned to be left in use at program termination. If
it is deemed necessary, some additional code can be written to cleanup the buffer before program
exit, or we could require the user to deallocate the buffer when they are done with it.
+ The buffer returned by {{{rw_locale_query()}}} is owned by that function and is not to be
dallocated by the user. This buffer is currently planned to be left in use at program termination.
If it is deemed necessary, some additional code can be written to cleanup the buffer before
program exit, or we could require the user to deallocate the buffer when they are done with
it.
+ 
+ [[Anchor(Ideas)]]
+ = Ideas =
+ 
+ I'm wondering why we didn't decide to use a callback system for this. It would allow us
to use arbitrary criteria to test a locale. The interface wouldn't always be 'grep-like',
but it would be very extensible. Something like this...
+ 
+ {{{
+ _TEST_EXPORT const char*
+ rw_locale_language (const char*);
+ 
+ _TEST_EXPORT const char*
+ rw_locale_territory (const char*);
+ 
+ _TEST_EXPORT const char*
+ rw_locale_codeset (const char*);
+ 
+ _TEST_EXPORT void
+ rw_locale_test (bool (*fun)(const char*, void*), void*);
+ }}}
+ 
+ The function {{{rw_locale_test()}}} would get a list of all installed locales, then pass
the name of those locales and the context pointer {{{p}}} to {{{fun}}}. The user function
could do whatever it wanted to decide if the locale is acceptable.
+ 
+ This would make it quite simple to select only locales with a specific attribute. For example
if we only wanted to select a locale with the largest MB_CUR_LEN value...
+ 
+ {{{
+ struct _locale_mb_context
+ {
+   char name [128];
+   int cur_len;
+ };
+ 
+ static bool
+ _rw_locale_mb_fun (const char* name, void* p)
+ {
+   const char* loc = setlocale (LC_CTYPE, name);
+   if (!loc)
+   {
+     _locale_mb_context* context =
+         (_locale_mb_context*)p;
+ 
+     const int cur_len = MB_CUR_LEN;  
+     if (context->cur_len < cur_len)
+     {
+       strcpy (context->name, loc);
+       context->cur_len = cur_len;
+     }
+   }
+ 
+   return false;
+ }
+ 
+ static const char*
+ test_big_mb_locale ()
+ {
+   locale_mb_context ctxt;
+   rw_locale_test (_rw_locale_mb_fun, &ctxt);
+ 
+   // run the test on locale named by ctxt.name
+ }
+ }}}
+ 
+ Or, to get a list of all locales that match brace expansion
+ 
+ {{{
+ static bool
+ _rw_locale_match (const char* name, void* p)
+ {
+   _locale_match_context* context =
+     (_locale_match_context*)p;
+ 
+   const char* language = rw_locale_language (name);
+   const char* country  = rw_locale_territory (name);
+   const char* codeset  = rw_locale_codeset (name);
+ 
+   char buf [128];
+   sprintf (buf, "%s-%s-%s", language, country, codeset);
+ 
+   for (const char* s = context->expr;
+        *s; s += strlen (s) + 1)
+   {
+     if (rw_fnmatch (s, name))
+     {
+       // run the test on locale named by name
+     }
+   }
+ 
+   return false;
+ }
+ 
+ static void
+ test_all_matches (const char* expr)
+ {
+   char buf [256];
+ 
+   char* res = rw_shell_expand (expr, 0, buf, sizeof (buf));
+  
+   _rw_locale_test (_rw_locale_match, res);
+ 
+   if (res != buf)
+     free (res);
+ }
+ }}}
  
  [[Anchor(References)]]
  = References =

Mime
View raw message