Return-Path: Delivered-To: apmail-cocoon-dev-archive@www.apache.org Received: (qmail 41434 invoked from network); 28 Dec 2008 12:24:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Dec 2008 12:24:31 -0000 Received: (qmail 62801 invoked by uid 500); 28 Dec 2008 12:24:29 -0000 Delivered-To: apmail-cocoon-dev-archive@cocoon.apache.org Received: (qmail 62763 invoked by uid 500); 28 Dec 2008 12:24:29 -0000 Mailing-List: contact dev-help@cocoon.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: dev@cocoon.apache.org List-Id: Delivered-To: mailing list dev@cocoon.apache.org Received: (qmail 62754 invoked by uid 99); 28 Dec 2008 12:24:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Dec 2008 04:24:29 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [84.14.33.41] (HELO gratin.goojet.com) (84.14.33.41) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Dec 2008 12:24:20 +0000 Received: from localhost (localhost [127.0.0.1]) by gratin.goojet.com (Postfix) with ESMTP id BA16825D66 for ; Sun, 28 Dec 2008 13:24:39 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at goojet.com Received: from gratin.goojet.com ([127.0.0.1]) by localhost (gratin.goojet [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id koRWIakZkUc7 for ; Sun, 28 Dec 2008 13:24:39 +0100 (CET) Received: from poukram-2.local (bny92-4-82-226-246-75.fbx.proxad.net [82.226.246.75]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by gratin.goojet.com (Postfix) with ESMTPSA id 9F84425D5B for ; Sun, 28 Dec 2008 13:24:37 +0100 (CET) Message-ID: <49576FCD.60902@apache.org> Date: Sun, 28 Dec 2008 13:23:41 +0100 From: Sylvain Wallez User-Agent: Thunderbird 2.0.0.18 (Macintosh/20081105) MIME-Version: 1.0 To: dev@cocoon.apache.org Subject: Re: [C3] StAX research reveiled! References: <49520644.30108@gmail.com> <4955F707.4020705@apache.org> <200812271413.43183.andreas.pieber@schmutterer-partner.at> <4956B9B2.7050300@apache.org> <495725CD.9050809@indoqa.com> In-Reply-To: <495725CD.9050809@indoqa.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Steven Dolg wrote: > Basically you're providing a buffer between every pair of components > and fill it as needed. Yes. Now this buffer will always contain a very limited number of events, corresponding to the result of processing an amount of input data that is convenient to process at once to avoid complex state management (e.g. an tag with all its children). And so most often, this buffer will contain just one event. Think of it as being just a bridge between the writer view used by a producer and the reader view used by its consumer. These are in my opinion the most convenient views to write StAX components. > But you need to implement both XMLStreamWriter and XMLStreamReader and > optimize that for any possible thing a transformer might do. > In order to buffer all the data from the components you will have to > create some objects as well - I guess you will end up with something > like the XMLEvent and maintaining a list of them in the StaxFIFO. > That's why I think an efficient (as in faster than the Event API) > implementation of the StaxFIFO is difficult to make. It's certainly less trivial than maitaining a list of events, but should be doable quite efficiently by using an int FIFO (to store event types and attribute counts) and a String FIFO (for everything else). I'll try find a couple of hours to prototype this. > On the other hand I do think that the cursor API is quite a bit harder > to use. > As stated in the Javadoc of XMLStreamReader it is the lowest level for > reading XML data - which usually means more logic in the code using > the API and more knowledge in the head of the developer > reading/writing the code is required. > So I second Andreas' statement that we will sacrifice simplicity for > (a small amount of ?) performance. I understand your point, even if I don't totally agree :-) Now it should be mentioned that if even with events, my proposal still stands: just replace XMLStream{Reader|Writer} with XMLEvent{Reader|Writer}. > The other thing is that - at least the way you suggested - we would > need a special implementation of the Pipeline interface. > That is something that compromises the intention behind having a > Pipeline API. > Right now we can use the new StAX components and simply put them into > any of the Pipeline implementations we already have. > Sacrificing this is completely out of the question IMO. Actually, I'm wondering if wanting a single API is not wishful thinking and will in the end lead to something that is overly abstract and hence difficult to understand and use, or where underlying implementations will leak in the high-level abstraction. There is already some impedence mismatch appearing between pull and push in the code: - a StAXGenerator has to call initiatePullProcessing() on its consumer, which in turn will have to call it on it's own consumer, etc until we reach the Finisher that will finally start pulling events. This moves a responsibility that belongs to the pipeline down to its components. - an AbstractStAXProducer only accepts a StAXConsumer, defeating the idea of a unified pipeline implementation that will accept everything. So we should either have several APIs specifically tailored to the underlying push or pull model, or make sure the unified API and its implementations accept any kind of component and set the appropriate conversion bridges between them. Sylvain -- Sylvain Wallez - http://bluxte.net