cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reinhard Pötz <>
Subject Re: [C3] StAX research reveiled!
Date Sun, 28 Dec 2008 13:55:18 GMT
Steven Dolg wrote:
> Sylvain Wallez schrieb:
>> <snip/>
>> Steven Dolg wrote:
>>> Basically you're providing a buffer between every pair of components
>>> and fill it as needed.
>> Yes. Now this buffer will always contain a very limited number of
>> events, corresponding to the result of processing an amount of input
>> data that is convenient to process at once to avoid complex state
>> management (e.g. an <i18:text> tag with all its children). And so most
>> often, this buffer will contain just one event.
>> Think of it as being just a bridge between the writer view used by a
>> producer and the reader view used by its consumer. These are in my
>> opinion the most convenient views to write StAX components.
>>> But you need to implement both XMLStreamWriter and XMLStreamReader
>>> and optimize that for any possible thing a transformer might do.
>>> In order to buffer all the data from the components you will have to
>>> create some objects as well - I guess you will end up with something
>>> like the XMLEvent and maintaining a list of them in the StaxFIFO.
>>> That's why I think an efficient (as in faster than the Event API) 
>>> implementation of the StaxFIFO is difficult to make.
>> It's certainly less trivial than maitaining a list of events, but
>> should be doable quite efficiently by using an int FIFO (to store
>> event types and attribute counts) and a String FIFO (for everything
>> else). I'll try find a couple of hours to prototype this.
>>> On the other hand I do think that the cursor API is quite a bit
>>> harder to use.
>>> As stated in the Javadoc of XMLStreamReader it is the lowest level
>>> for reading XML data - which usually means more logic in the code
>>> using the API and more knowledge in the head of the developer
>>> reading/writing the code is required.
>>> So I second Andreas' statement that we will sacrifice simplicity for
>>> (a small amount of ?) performance.
>> I understand your point, even if I don't totally agree :-) Now it
>> should be mentioned that if even with events, my proposal still
>> stands: just replace XMLStream{Reader|Writer} with
>> XMLEvent{Reader|Writer}.
>>> The other thing is that - at least the way you suggested - we would
>>> need a special implementation of the Pipeline interface.
>>> That is something that compromises the intention behind having a
>>> Pipeline API.
>>> Right now we can use the new StAX components and simply put them into
>>> any of the Pipeline implementations we already have.
>>> Sacrificing this is completely out of the question IMO.
>> Actually, I'm wondering if wanting a single API is not wishful
>> thinking and will in the end lead to something that is overly abstract
>> and hence difficult to understand and use, or where underlying
>> implementations will leak in the high-level abstraction.
>> There is already some impedence mismatch appearing between pull and
>> push in the code:
>> - a StAXGenerator has to call initiatePullProcessing() on its
>> consumer, which in turn will have to call it on it's own consumer, etc
>> until we reach the Finisher that will finally start pulling events.
>> This moves a responsibility that belongs to the pipeline down to its
>> components.
> Well I don't see the problem with that.
> From the pipeline's point of view those are normal components just like
> all the other.
> The pipeline was never intended to "care" about the internals of the
> components - so why bothering that the StAXGenerator calls
> "initiatePullProcessing" on its consumer instead of calling some other
> method like e.g. "startDocument".
>> - an AbstractStAXProducer only accepts a StAXConsumer, defeating the
>> idea of a unified pipeline implementation that will accept everything.
> The idea was to have pipelines being capable of processing virtually any
> data.
> But that is not the same as combining components in an arbitrary way,
> e.g. there is no sense in linking a FileGenerator with an (not yet
> existing) ImageTransformer based on Java's Imaging API.
> The components must be "compatible" - that is they must understand the
> data they exchange with each other.
> We may however provide some adapters/converters to make certain "types"
> of components compatible, e.g. SAX <--> StAX.
>> So we should either have several APIs specifically tailored to the
>> underlying push or pull model, or make sure the unified API and its
>> implementations accept any kind of component and set the appropriate
>> conversion bridges between them.
> As I tried to state above: that will not be possible for every
> conceivable combination of components.
> At least not when thinking beyond XML - which I do.

Steven was faster than me but his comments are the same that I wanted to

Reinhard Pötz                           Managing Director, {Indoqa} GmbH

Member of the Apache Software Foundation
Apache Cocoon Committer, PMC member        

View raw message