incubator-stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Sebor <>
Subject Re: [Stdcxx Wiki] Update of "LocaleLookup" by MartinSebor
Date Wed, 12 Mar 2008 20:00:56 GMT
Travis Vitek wrote:
>> From: Apache Wiki [] 
>> The new 
>> interface will need to make it easy to specify such a set of 
>> locales without explicitly naming them, and it will need to
>> retrieve such locales without returning duplicates.
> As mentioned before I don't know a good way to avoid duplicates other
> than to compare every attribute of each facet of each locale to all of
> the other locales. Just testing to see if the return from setlocale() is
> the same as the input string is not enough. The user could have intalled
> locales that have unique names but are copies of the data from some
> other locale.

True, but we don't care about how long the test might run on
some user's system. What we care about here is that *we* don't
run tests unnecessarily on our own build servers, and we can
safely make the simplifying assumption that there are no user
defined locales installed on them.

>> The interface should make it easy to 
>> express conjunction, disjunction, and negation of the terms 
>> (parameters) and support (a perhaps simplified version of) 
>> [
>> p09.html#tag_09_03 Basic Regular Expression] syntax.
> Conjunction, disjunction and negation? Are you saying you want to be
> able to select all locales that are _not_ in some set, something like
> you would get with a caret (^} in a grep expression?

No, I meant something simple like grep -v.

> I'm hoping that I'm just misunderstanding your comments. If not, then
> this is news to me and I'm a bit curious just how this addition is
> necessary to minimize the number of locales tested [i.e. the objective].

It may not be necessary. I included it for completeness, thinking
if it wasn't already there it could be easily added in the form
of an argument of the function. If it isn't there we can leave
it out until we need it.

>> We've 
>> decided to use shell brace expansion as a means of expressing 
>> logical conjunction between terms: a valid brace expression is 
>> expanded to obtain a set of terms implicitly connected by a 
>> logical AND. Individual ('\n'-separated) lines of the query 
>> string are taken to be implicitly connected by a logical OR. 
>> This approach models the 
>> [
>> tml grep] interface with each line loosely corresponding to 
>> the argument of the `-e` option to `grep`.
> I've seen you mention the '\n' seperated list thing before, but I still
> can't make sense of it. Are you saying

In my mind the query expression consists of terms connected
by operators for conjunction, disjunction (let's forget about
negation for now). E.g., like so:

   qry-expr ::= <dis-expr>
   dis-expr ::= <con-expr> | <dis-expr> <NL> <con-expr>
   con-expr ::= <term> | <term> <SP> <con-expr>

For example:

   "foo bar" is a con-expr of the terms "foo" and "bar" denoting
   the intersection of foo and bar, and

   "123 xyz\nKLM" is a dis-expr of the terms "123 xyz" and "KLM"
   denoting the union of the the two terms. "123 xyz" is itself
   a con-expr denoting the intersection of 123 and xyz.

> that to select `en_US.*' with a 1
> byte encoding or `zh_*.UTF-8' with a 2, 3, or 4 byte encoding, I would
> write the following query?

I think it might be simpler to keep things abstract but given my
specification above a simple query string would look like this:

   "en_US.*    1\n"
   "zh_*.UTF-8 2\n"
   "zh_*.UTF-8 3\n"
   "zh_*.UTF-8 4\n"

for the equivalent of:

      locale == "en_US.*"    && MB_CUR_MAX == 1
   || locale == "zh_*.UTF-8" && MB_CUR_MAX == 2
   || locale == "zh_*.UTF-8" && MB_CUR_MAX == 3
   || locale == "zh_*.UTF-8" && MB_CUR_MAX == 4

I'm not sure how you could use brace expressions here. Maybe it
should be the other way around (<SP> should be OR and <NL> AND).
But then the grep -e idea is out the window. Or maybe we need
a way to denote/group terms. Then we might be able to say:

   "en_US.*    1\n"
   "zh_*.UTF-8 ({2..4})"

expand it to

   "en_US.*    1\n"
   "zh_*.UTF-8 (2 3 4)"

and "know" that the spaces in "2 3 4" denote OR even though the
space in "zh_*.UTF-8 (" denotes an AND. Ugh. I'm not sure I like
this all that much better.

>   const char* locales = rw_locale_query ("en_US.* 1\nzh_*.UTF-8 {2..4}",
> 10);
> I don't see why that would be necessary. You can do it with the
> following query using normal brace expansion, and it's human readable.
>   const char* locales = rw_locale_query ("{en_US.* 1,zh_*.UTF-8
> {2..4}}", 10);

What's "{en_US.* 1,zh_*.UTF-8 {2..4}}" supposed to expand to?
Bash 3.2 doesn't expand it. I suppose it could be
   "en_US.* 1 zh_*.UTF-8 2 3 4" or
"en_US.* 1 zh_*.UTF-8 2 zh_*.UTF-8 3 zh_*.UTF-8 4"

Either way I think I'm getting confused by the lack of distinction
between what's OR and what's AND.

> I know that the '\n' is how you'd use `grep -e', but does it really make
> sense? We aren't using `grep -e' here.

I'm trying to model the interface on something we all know how
to use. grep -e seemed the closest example of an known interface
that would let us do what we want that I could think of.

Maybe it would help to go back to the basics and try to approach
this by analyzing the problem first. Would putting this together
be helpful?

   1. the set of locale attributes we want to keep track of
      in our locale "database"

   2. one or more possible formats of the database

   3. the kinds of queries done in our locale tests, and the
      ones we expect to do in future tests

With that, we can create a prototype solution using an existing
query language of our choice (such as grep). Once that works,
the grammar should naturally fall out and we can reimplement
the prototype in the test driver.


View raw message