Return-Path: Delivered-To: apmail-lucene-lucy-dev-archive@locus.apache.org Received: (qmail 59818 invoked from network); 6 Jul 2006 21:47:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 6 Jul 2006 21:47:52 -0000 Received: (qmail 28242 invoked by uid 500); 6 Jul 2006 21:47:52 -0000 Delivered-To: apmail-lucene-lucy-dev-archive@lucene.apache.org Received: (qmail 28200 invoked by uid 500); 6 Jul 2006 21:47:51 -0000 Mailing-List: contact lucy-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@lucene.apache.org Delivered-To: mailing list lucy-dev@lucene.apache.org Received: (qmail 28187 invoked by uid 99); 6 Jul 2006 21:47:51 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jul 2006 14:47:51 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [68.116.38.223] (HELO rectangular.com) (68.116.38.223) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jul 2006 14:47:50 -0700 Received: from [67.189.26.9] (helo=[10.0.1.3]) by rectangular.com with esmtpa (Exim 4.44) id 1Fybkc-0003jC-TF for lucy-dev@lucene.apache.org; Thu, 06 Jul 2006 14:51:10 -0700 Mime-Version: 1.0 (Apple Message framework v752.2) In-Reply-To: References: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <2A325994-EEC2-4AEF-8F37-3A89F5F471BC@rectangular.com> Content-Transfer-Encoding: 7bit From: Marvin Humphrey Subject: Re: Charmonizer Date: Thu, 6 Jul 2006 14:47:28 -0700 To: lucy-dev@lucene.apache.org X-Mailer: Apple Mail (2.752.2) X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Greets, I think it would be better if the Charmonizer's syntax was 1) a little more C-like, and 2) more concise. Here's a before-and-after illustrating how I think things should change: /* current */ CH_foo number 22 string a string source CH_quote #include int main() { printf("Greetings, earthlings!"); return 0; } CH_end_quote CH_end_foo /* proposed */ CH_foo('22', 'a string', CH_q #include int main() { printf("Salutations, earthlings!"); return 0; } CH_q ); Labeled parameters are something I prefer generally to signatures, but they're not very C-like, and the small set of functions provided by Charmonizer does not benefit from having them. The real reason they're in there is that there happened to be a happy interaction between parsing line-by-line and labeled params. They should go away. If we go to fixed argument lists, potentially with multiple args on one line, parsing them gets more complicated. The easy way to handle this is to delimit each one with quotes -- but Charmonizer's current quote mechanism is cumbersome: /* passing the number 22 to CH_fubar */ CH_fubar(CH_quote22CH_end_quote); Therefore, support for arguments delimited by single quotes and separated by commas should be added. CH_baz('22', 'twenty-two'); However, we'll keep the policy of no interpolations and no escapes -- which keeps the parser extremely small and simple. So if you have a short argument you use plain old single quotes, and if you have a longer argument or one which needs single quotes inside it, you use the extended quote mechanism. Speaking of that extended quote mechanism, CH_quote and CH_end_quote should go away, to be replaced by a matching pair of 'CH_q' strings. That way, ' and CH_q are parallel constructs. Plus, Huffman coding suggests that the CH_q delimiter should be relatively small -- though it still has to be long and weird enough to be unlikely to ever occur in stuff you'd want to quote. Beginning all keywords with 'CH_' is a bit heavy, but it serves to remind you that this isn't C, it's Charmonizer. What's more, anyone who's done modular programming in C is used to seeing namespaces faked with prefixes. So I think we should keep that. Closing functions with 'CH_end_function_name' was nice for creating a bulletproof parser, and might have been a useful constraint if there were going to be many such parsers. But if there's only one, it's better to terminate with a semi-colon. Delimiting argument lists with parens is not strictly necessary from a parsing standpoint, but it's clean and familiar. All of this syntax is extensible. We can add variables, escapes and interpolation via " and CH_iq, and more functions later if we want. I can think of some functions I'd like to add, so another email's on its way... Sound good? Marvin Humphrey Rectangular Research http://www.rectangular.com/