Return-Path: Delivered-To: apmail-lucene-lucy-dev-archive@minotaur.apache.org Received: (qmail 25259 invoked from network); 13 Mar 2009 12:30:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Mar 2009 12:30:21 -0000 Received: (qmail 97676 invoked by uid 500); 13 Mar 2009 12:30:21 -0000 Delivered-To: apmail-lucene-lucy-dev-archive@lucene.apache.org Received: (qmail 97637 invoked by uid 500); 13 Mar 2009 12:30:21 -0000 Mailing-List: contact lucy-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@lucene.apache.org Delivered-To: mailing list lucy-dev@lucene.apache.org Received: (qmail 97626 invoked by uid 99); 13 Mar 2009 12:30:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Mar 2009 05:30:21 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Mar 2009 12:30:11 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 77AD0234C041 for ; Fri, 13 Mar 2009 05:29:50 -0700 (PDT) Message-ID: <743695509.1236947390485.JavaMail.jira@brutus> Date: Fri, 13 Mar 2009 05:29:50 -0700 (PDT) From: "Michael McCandless (JIRA)" To: lucy-dev@lucene.apache.org Subject: [jira] Commented: (LUCY-5) Boilerplater compiler In-Reply-To: <1084854103.1236710570549.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCY-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681691#action_12681691 ] Michael McCandless commented on LUCY-5: --------------------------------------- Thanks for all these details Marvin! I have a better picture now. I know there are issues with it, but... have you considered simply using C++, which has already created OO over C (vtables, etc.)? Or are there hopeless problems with its approach for Lucy? Another question: it seems like you are going to great lengths to achieve "no recompilation back compatibility". Meaning, eg, if you've someone has built Python bindings to version X of Lucy, and you've made some otherwise-back-compatibile changes to the exposed API and release version X+1, you'd like for those Python bindings to continue to work w/o recompilation when someone drops in Lucy X+1 (as a dynamic library), right? Is this feature really necessary? Couldn't you require that the bindings are rebuilt & recompiled when Lucy X+1 is released? EG, Lucene just released 2.4.1, and so PyLucene went and regen'd its bindings (using JCC) and recompiled and [almost] released PyLucene 2.4.1. {quote} This causes severe runtime memory errors when a compiled extension expects to find a function pointer with a certain signature at a given hard-coded offset, but finds something unexpected and incompatible there instead. However, if we store the offsets into the vtable as variables - a change which seems to have minimal/negligible performance impact - then a compiled extension can adapt to a new vtable layout presented by a recompiled core. {quote} But if added methods always went to the end of the vtable, wouldn't things work fine, as long as you had bounds checking so that if new code tried to look up a new method on old compiled code it would see it's not there? bq. Here's the method-invocation wrapper for Scorer_Next. This seems like a fair amount of overhead per-invocation. Is it possible/OK for the caller to grab the next method up front and then invoke it itself? Would "core" scorers be able to somehow bypass this lookup? {quote} Each binding will have to implement lucy_Native_callback_i() and a few other methods declared by Native. {quote} Native in this case means the dynamic language, right? Ie, lucy_Native_callback_i would invoke my Python method for "next", when I've defined next in Python in my Matcher subclass? > Boilerplater compiler > --------------------- > > Key: LUCY-5 > URL: https://issues.apache.org/jira/browse/LUCY-5 > Project: Lucy > Issue Type: New Feature > Components: Boilerplater > Reporter: Marvin Humphrey > Assignee: Marvin Humphrey > > Boilerplater is a small compiler which supports a vtable-based object model. > The output is C code which adheres to the design that Dave Balmain and I > hammered out a while back; the input is a collection of ".bp" header files. > Our original intent was to pepper traditional C ".h" header files with no-op > macros to define each class's interface; the code generator would understand > these macros but the C compiler would ignore them. C source code files would > then pound-include both the ".h" header and the auxiliary, generated ".bp" > file. > The problem with this approach is that C syntax is too constraining. Because > C does not support namespacing, every symbol has to be prepended with a prefix > to avoid conflicts. Futhermore, adding metadata to declarations (such as > default values for arguments, or whether NULL is an acceptable value) is > awkward. The result is ".h" header files that are excessively verbose, > cumbersome to edit, and challenging to parse visually and to grok. > The solution is to make the ".bp" file the master header file, and write it in > a small, purpose-built, declaration-only language. The > code-generator/compiler chews this ".bp" file and spits out a single ".h" > header file for pound-inclusion in ".c" source code files. > This isn't really that great a divergence from the original plan. There's no > fixed point at which a "code generator" becomes a "compiler", and while the > declaration-only header language has a few conventions that core developers > will have to familiarize themselves with, the same was true for the no-op > macro scheme. Furthermore, the Boilerplater compiler itself is merely an > implementation detail; it is not publicly exposed and thus can be modified at > will. Users who access Lucy via Perl, Ruby, Java, etc will never see it. > Even Lucy's C users will never see it, because the public C API itself will be > defined by a lightweight binding and generated documentation. > The important thing for us to focus on is the *output* code generated by > Boilerplater. We must nail the object model. It has to be fast. It has to > live happily as a symbiote within each host. It has to support callbacks into > the host language, so that users may define custom subclasses and override > methods easily. It has to present a robust ABI that makes it possible to > recompile an updated core without breaking compiled extensions (like Java, > unlike C++). > The present implementation of the Boilerplater compiler is a collection of > Perl modules: Boilerplater::Type, Boilerplater::Variable, > Boilerplater::Method, Boilerplater::Class, and so on. One CPAN module is > required, Parse::RecDescent; however, only core developers will need either > Perl or Parse::RecDescent, since public distributions of Lucy will > contain pre-generated code. Some of Boilerplater's modules have kludgy > internals, but on the whole they seem to do a good job of throwing errors rather > than failing subtly. > I expect to submit individual Boilerplater modules using JIRA sub-issues which > reference this one, to allow room for adequate commentary. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.