incubator-stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Travis Vitek <>
Subject Re: [Stdcxx Wiki] Update of "LocaleLookup" by MartinSebor
Date Mon, 17 Mar 2008 08:11:30 GMT

Martin Sebor wrote:
> But we do need to come up with a sound specification of the query syntax
> before implementing any more code.

Okay, the proposed query syntax grammar essentially the same as that being
used for the <config> value in xfail.txt. So we have

  <match> is a shell globbing pattern in the format below. All fields
  are required.

  iso-country  ::= ISO-639-1 or ISO-639-2 two or three character country
  iso-language ::= ISO-3166 two character language code
  iana-codeset ::= IANA codeset name with '-' replaced or removed

  match        ::=
  match_list   ::= match | match ' ' match_list

So the previous example to select `en_US.*' with a 1 byte encoding or
`zh_*.UTF-8' with a 2, 3, or 4 byte encoding would use the following query

  en-US-*-1 zh-*-UTF8-2 zh-*-UTF8-3 zh-*-UTF8-4

This long expression could be written using a brace expansion to simplify

  en-US-*-1 zh-*-UTF8-{2,3,4}

I propose that we not support the BRE syntax, simply because it is so
complex. Yes, it might be quite easy to prototype a solution using grep and
other shell utilities, but providing a complete implementatoin in C [where
we actually need it] is going to be difficult at best. For what we need,
shell globbing should be sufficient to handle the cases that we need to
satisfy the objective.

I suppose you could consider en-US-*-1 is "language=en" and "country=US" and
"codeset=*" and "mb_cur_len=1" so '-' represents an intersection operation,
but I prefer to think of the entire expression to be either a match or not a

Martin Sebor wrote:
> I think it's great
> to put together a prototype at the same time, just as long as it's
> understood that the prototype might need to change as we discover
> flaws in it or better ways of doing it.

I have no problem with flaws or small improvements. When we start talking
about implementing a regular expression parser I get concerned.

View this message in context:
Sent from the stdcxx-dev mailing list archive at

View raw message