Return-Path: Delivered-To: apmail-stdcxx-dev-archive@www.apache.org Received: (qmail 7059 invoked from network); 17 Mar 2008 08:12:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Mar 2008 08:12:01 -0000 Received: (qmail 94013 invoked by uid 500); 17 Mar 2008 08:11:59 -0000 Delivered-To: apmail-stdcxx-dev-archive@stdcxx.apache.org Received: (qmail 93987 invoked by uid 500); 17 Mar 2008 08:11:58 -0000 Mailing-List: contact dev-help@stdcxx.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@stdcxx.apache.org Delivered-To: mailing list dev@stdcxx.apache.org Received: (qmail 93978 invoked by uid 99); 17 Mar 2008 08:11:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Mar 2008 01:11:58 -0700 X-ASF-Spam-Status: No, hits=2.6 required=10.0 tests=DNS_FROM_OPENWHOIS,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Mar 2008 08:11:20 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1JbARO-000425-I4 for dev@stdcxx.apache.org; Mon, 17 Mar 2008 01:11:30 -0700 Message-ID: <16089939.post@talk.nabble.com> Date: Mon, 17 Mar 2008 01:11:30 -0700 (PDT) From: Travis Vitek To: dev@stdcxx.apache.org Subject: Re: [Stdcxx Wiki] Update of "LocaleLookup" by MartinSebor In-Reply-To: <47D89A35.7090208@roguewave.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: vitek@roguewave.com References: <47D83678.9060103@roguewave.com> <47D89A35.7090208@roguewave.com> X-Virus-Checked: Checked by ClamAV on apache.org Martin Sebor wrote: > > But we do need to come up with a sound specification of the query syntax > before implementing any more code. > Okay, the proposed query syntax grammar essentially the same as that being used for the value in xfail.txt. So we have is a shell globbing pattern in the format below. All fields are required. iso-country ::= ISO-639-1 or ISO-639-2 two or three character country code iso-language ::= ISO-3166 two character language code iana-codeset ::= IANA codeset name with '-' replaced or removed match ::= --- match_list ::= match | match ' ' match_list So the previous example to select `en_US.*' with a 1 byte encoding or `zh_*.UTF-8' with a 2, 3, or 4 byte encoding would use the following query string. en-US-*-1 zh-*-UTF8-2 zh-*-UTF8-3 zh-*-UTF8-4 This long expression could be written using a brace expansion to simplify it. en-US-*-1 zh-*-UTF8-{2,3,4} I propose that we not support the BRE syntax, simply because it is so complex. Yes, it might be quite easy to prototype a solution using grep and other shell utilities, but providing a complete implementatoin in C [where we actually need it] is going to be difficult at best. For what we need, shell globbing should be sufficient to handle the cases that we need to satisfy the objective. I suppose you could consider en-US-*-1 is "language=en" and "country=US" and "codeset=*" and "mb_cur_len=1" so '-' represents an intersection operation, but I prefer to think of the entire expression to be either a match or not a match. Martin Sebor wrote: > > I think it's great > to put together a prototype at the same time, just as long as it's > understood that the prototype might need to change as we discover > flaws in it or better ways of doing it. > I have no problem with flaws or small improvements. When we start talking about implementing a regular expression parser I get concerned. Travis -- View this message in context: http://www.nabble.com/RE%3A--Stdcxx-Wiki--Update-of-%22LocaleLookup%22-by-MartinSebor-tp15992191p16089939.html Sent from the stdcxx-dev mailing list archive at Nabble.com.