stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Travis Vitek" <>
Subject RE: [Stdcxx Wiki] Update of "LocaleLookup" by MartinSebor
Date Tue, 11 Mar 2008 21:07:18 GMT

>From: Apache Wiki [] 
>The new 
>interface will need to make it easy to specify such a set of 
>locales without explicitly naming them, and it will need to
>retrieve such locales without returning duplicates.

As mentioned before I don't know a good way to avoid duplicates other
than to compare every attribute of each facet of each locale to all of
the other locales. Just testing to see if the return from setlocale() is
the same as the input string is not enough. The user could have intalled
locales that have unique names but are copies of the data from some
other locale.

>The interface should make it easy to 
>express conjunction, disjunction, and negation of the terms 
>(parameters) and support (a perhaps simplified version of) 
>p09.html#tag_09_03 Basic Regular Expression] syntax.

Conjunction, disjunction and negation? Are you saying you want to be
able to select all locales that are _not_ in some set, something like
you would get with a caret (^} in a grep expression?

I'm hoping that I'm just misunderstanding your comments. If not, then
this is news to me and I'm a bit curious just how this addition is
necessary to minimize the number of locales tested [i.e. the objective].

>decided to use shell brace expansion as a means of expressing 
>logical conjunction between terms: a valid brace expression is 
>expanded to obtain a set of terms implicitly connected by a 
>logical AND. Individual ('\n'-separated) lines of the query 
>string are taken to be implicitly connected by a logical OR. 
>This approach models the 
>tml grep] interface with each line loosely corresponding to 
>the argument of the `-e` option to `grep`.

I've seen you mention the '\n' seperated list thing before, but I still
can't make sense of it. Are you saying that to select `en_US.*' with a 1
byte encoding or `zh_*.UTF-8' with a 2, 3, or 4 byte encoding, I would
write the following query?

  const char* locales = rw_locale_query ("en_US.* 1\nzh_*.UTF-8 {2..4}",

I don't see why that would be necessary. You can do it with the
following query using normal brace expansion, and it's human readable.

  const char* locales = rw_locale_query ("{en_US.* 1,zh_*.UTF-8
{2..4}}", 10);

I know that the '\n' is how you'd use `grep -e', but does it really make
sense? We aren't using `grep -e' here.


View raw message