stdcxx-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Stdcxx Wiki] Update of "LocaleLookup" by TravisVitek
Date Tue, 11 Mar 2008 18:49:10 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Stdcxx Wiki" for change notification.

The following page has been changed by TravisVitek:
http://wiki.apache.org/stdcxx/LocaleLookup

------------------------------------------------------------------------------
  
  If we look up the canonical name {{{es_BO.ISO-8859-1}}} we will see three possible locale
names. If we look through our list of installed locales, we will find {{{es_BO}}}, but it
would be wrong to return that locale because it doesn't actually match on this particular
platform.
  
- So one solution for this might be to get the codeset name and store it in the mapping. This
assumes that it is safe to request a locale using with the a codeset even though the list
of installed locales didn't specify the codset.
+ Now we use the above data to figure out canonical name from local name, or vice-versa.
+ 
+ {{{
+   es_BO.8859-15 maps to local name es_BO.ISO-8859-15
+   es_BO         maps to local name es_BO.ISO-8859-15 or es_BO.ISO-8859-1
+ }}}
+ 
+ How do we know which {{{es_BO}}} is right for this platform?
+ 
+ One possible direction here is to ask a locale for its codeset. Unfortunately the returned
string needs to be mapped to a canonical string. i.e. it might return {{{iso88591}}} on one
platform, and {{{ISO-8859-1}}} on another.
+ 
+ If we need to ask a locale for its codeset and then use an additional mapping to get the
canonical codeset name, then why not just provide lookups for each component of the canonical
locale name and look them up individually?
+ 
+ We would need at least three different mappings. We would need four if we wanted to map
from a language code to a default territory code. This would be necessary so that we can map
locale names like {{{russian}}} or {{{ru}}} to an appropriate territory code.
+ 
+ {{{
+   # codeset mappings [one to many]
+   ISO-8859-1    8859-1 ISO8859-1
+   ISO-8859-15   8859-15 ISO8859-15
+   1252          CP-1252 IBM-1252
+   1254          CP-1254 IBM-1254
+ 
+   # language mappings [one to many]
+   en	English
+   es    Spanish
+   ab    Abkhazian abk
+   sq    Albanian alb sqi
+ 
+   # territory mappings [one to many]
+   US   "United States"
+   DE    Germany  
+ 
+   # default territory for language mappings [one to one]
+   ru RU
+   cs CZ
+ }}}
+ 
+ The advantage of this scheme over the previous scheme is that if we encounter a locale that
we don't know, we might be able to get a valid canonical name for it. with the previous scheme,
if we can't find a mapping for the name, then we just use the original name as the canonical
name. If we did this, we would be able to build up a canonical name for it, and that would
increase the chances of being able to use it.
  
  Another issue is that the data associated with each of the canonical locales, like {{{MB_CUR_LEN}}},
is different on each platform. The {{{ar_DZ.UTF-8}}} locale uses a 6 byte codeset on Linux,
but a 4 byte codeset on other platforms.
  
- I think the solution for this would be to not store the MB_CUR_LEN value in the file, but
capture it and append it to the canonical locale name when we enumerate the installed locales.
+ I think the logical solution for this would be to not store the {{{MB_CUR_LEN}}} value in
the file, but capture it and append it to the canonical locale name when we enumerate the
installed locales. See notes in Part3 about {{{MB_CUR_LEN}}}.
  
  [[Anchor(Part3)]]
  = Part 3 (STDCXX-716) =
@@ -116, +153 @@

  The proposed interface to all of this is a single public function named rw_query_locales().
The signature would be...
  
  {{{
-   char* rw_query_locales(const char* query, size_t count);
+   char* rw_query_locales (const char* query, size_t count);
  }}}
  
  The {{{query}}} parameter will be the query string. The {{{count}}} parameter is the maximum
number of locales to return. This allows you to easily limit the number of locales tested.
  
- The expected format of the query string is similar to what is described above, except that
the requested MB_CUR_LEN value will be expected to be part of the query string. The accepted
MB_CUR_LEN value would be seperated from the canonical locale name expression with a period.
An example query string...
+ The expected format of the query string is similar to what is described above, except that
the requested {{{MB_CUR_LEN}}} value will be expected to be part of the query string. The
accepted {{{MB_CUR_LEN}}} value would be seperated from the canonical locale name expression
with a period. An example query string...
  
  {{{
-    "zh_*.*.{5..3} *_FR.*.1"
+    zh_*.*.{5..3} *_FR.*.1
  }}}
  
  This would match all 5, 4 and 3 byte encodings of the Chinese language in any country, then
all 1 byte encodings for any language spoken in France.

Mime
View raw message