Return-Path: Delivered-To: apmail-stdcxx-dev-archive@www.apache.org Received: (qmail 21572 invoked from network); 23 Mar 2008 16:23:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Mar 2008 16:23:01 -0000 Received: (qmail 8559 invoked by uid 500); 23 Mar 2008 16:22:58 -0000 Delivered-To: apmail-stdcxx-dev-archive@stdcxx.apache.org Received: (qmail 8495 invoked by uid 500); 23 Mar 2008 16:22:58 -0000 Mailing-List: contact dev-help@stdcxx.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@stdcxx.apache.org Delivered-To: mailing list dev@stdcxx.apache.org Received: (qmail 8486 invoked by uid 99); 23 Mar 2008 16:22:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Mar 2008 09:22:58 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of msebor@gmail.com designates 209.85.200.175 as permitted sender) Received: from [209.85.200.175] (HELO wf-out-1314.google.com) (209.85.200.175) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Mar 2008 16:22:18 +0000 Received: by wf-out-1314.google.com with SMTP id 27so2662151wfd.2 for ; Sun, 23 Mar 2008 09:22:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:organization:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding:sender; bh=97duqtl4GhgM6z9Wk/PjnEFWHxTyGhWKvBXI31xv2aA=; b=oTfDaQYfKzdmNxMM/96cZgReKC5hlrpFPX/yWpLdxMPg6rRMiA2CNxPOcNC0X7tgjRBOOhQ278IjlhyJCwrNJgXUSixNZbjG0sNtjn8pMj0ADGk10ScQ7Tv2errhXR0whBAbfHHqeR6nfRdlqKAnM9VR4c/TP2t8nxT8Mu/neDQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=message-id:date:from:organization:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding:sender; b=EN9RoIqFE+vMV6f5avVpp+Lg8zoDBxfiU/4v3LNClsbz/xT56QvavN7y7Ip6keD8FLE1xx1c2Cq55QT41DfltfdTYskZuFgTW5ph6briPo2oVADIz84rtt10ymdGNPN+l18B+BrjPAtdnNwauSstxVE7R9ED25wv7eiFn/Hi6HI= Received: by 10.142.221.19 with SMTP id t19mr3809852wfg.100.1206289350062; Sun, 23 Mar 2008 09:22:30 -0700 (PDT) Received: from localhost.localdomain ( [71.229.200.170]) by mx.google.com with ESMTPS id 28sm12771027wfd.1.2008.03.23.09.22.27 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 23 Mar 2008 09:22:27 -0700 (PDT) Message-ID: <47E683C2.3060004@roguewave.com> Date: Sun, 23 Mar 2008 10:22:26 -0600 From: Martin Sebor Organization: Rogue Wave Software, Inc. User-Agent: Thunderbird 2.0.0.12 (X11/20080226) MIME-Version: 1.0 To: dev@stdcxx.apache.org Subject: Re: [Stdcxx Wiki] Update of "LocaleLookup" by MartinSebor References: <47D83678.9060103@roguewave.com> <47D89A35.7090208@roguewave.com> <16089939.post@talk.nabble.com> In-Reply-To: <16089939.post@talk.nabble.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: Martin Sebor X-Virus-Checked: Checked by ClamAV on apache.org Travis Vitek wrote: > > > Martin Sebor wrote: >> But we do need to come up with a sound specification of the query syntax >> before implementing any more code. >> > > Okay, the proposed query syntax grammar essentially the same as that being > used for the value in xfail.txt. So we have > > is a shell globbing pattern in the format below. All fields > are required. > > iso-country ::= ISO-639-1 or ISO-639-2 two or three character country > code > iso-language ::= ISO-3166 two character language code > iana-codeset ::= IANA codeset name with '-' replaced or removed Or escaped or quoted? E.g., UTF\-8 or "UTF-8" If it's all the same to you I would prefer to keep the IANA names unchanged. A good number of them use the dash to separate two numeric parts of the name from each other (e.g., ISO-8859-1 and ISO-8859-13) so dropping the dash would make it difficult to tell one from the other, and replacing the dash would mean finding a suitable character for the replacement that's not used in any of the names and that's easy enough to remember (I suppose the equals sign might qualify if we had to go that route). > > match ::= > --- > match_list ::= match | match ' ' match_list > > So the previous example to select `en_US.*' with a 1 byte encoding or > `zh_*.UTF-8' with a 2, 3, or 4 byte encoding would use the following query > string. > > en-US-*-1 zh-*-UTF8-2 zh-*-UTF8-3 zh-*-UTF8-4 Okay, this makes it clear that space is an OR. The AND is implicit in the dash, and there's no need for the '\n'. > > This long expression could be written using a brace expansion to simplify > it. > > en-US-*-1 zh-*-UTF8-{2,3,4} > > I propose that we not support the BRE syntax, simply because it is so > complex. Which part are you suggesting we not support? I ask because I don't recall us talking about supporting the full BRE or anything beyond the subset already implemented in rw_fnmatch(). > Yes, it might be quite easy to prototype a solution using grep and > other shell utilities, but providing a complete implementatoin in C [where > we actually need it] is going to be difficult at best. For what we need, > shell globbing should be sufficient to handle the cases that we need to > satisfy the objective. > > I suppose you could consider en-US-*-1 is "language=en" and "country=US" and > "codeset=*" and "mb_cur_len=1" so '-' represents an intersection operation, > but I prefer to think of the entire expression to be either a match or not a > match. Sure. I personally don't see a difference between the two from a practical POV. > > > Martin Sebor wrote: >> I think it's great >> to put together a prototype at the same time, just as long as it's >> understood that the prototype might need to change as we discover >> flaws in it or better ways of doing it. >> > > I have no problem with flaws or small improvements. When we start talking > about implementing a regular expression parser I get concerned. I fully agree that implementing regular expressions just for this project would be overkill. I don't think I ever suggested that we implement BRE for this though. If I ever mentioned BRE (e.g., on the wiki) I was referring to the subset used for fnmatch globbing. If I somehow gave the impression that I was proposing we implement it now I apologize for confusing things. Martin