cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Dolg <steven.d...@indoqa.com>
Subject Re: [C3] StAX research reveiled!
Date Tue, 13 Jan 2009 15:28:40 GMT
Grzegorz Kossakowski schrieb:
> Jakob Spörk pisze:
>   
>> Hello,
>>     
>
> Hello Jakob,
>
>   
>> I just want to give my thoughts to unified pipeline and data conversion
>> topic. In my opinion, the pipeline can't do the data conversion, because it
>> has no information about how to do this. Let's take a simple example: We
>> have a pipeline processing XML documents that describe images. The first
>> components process this xml data while the rest of the components do
>> operations on the actual image. Now is the question, who will transform the
>> xml data to image data in the middle of the pipeline? 
>>     
>
> I agree with you that pipeline implementation should not handle data conversion because
there is no generic way to
> handle it.
>
> Now I would like to answer your question: it should be another /pipeline component/ that
handles data conversion.
>
>   
>> I believe the pipeline cannot do this, because it simply do not know how to
>> transform, because that’s a custom operation. You would need a component
>> that is on the one hand a XML consumer and on the other hand an image
>> producer. Providing some automatic data conversions directly in the pipeline
>> may help developers that need exactly these default cases but I believe it
>> would be harder for people requiring custom data conversions (and that are
>> most of the cases).
>>
>> The actual architecture allows to fit any components into the pipeline, and
>> only the components itself have to know if they can work with their
>> predecessor or the component following them. That allow most flexibility
>> when thinking about any possible conversions. If a pipeline should do this,
>> you would need "plug-ins" for the pipeline that are registered and allow the
>> pipeline to do the conversions. But then, it is the responsibility of the
>> developer to register the right conversion plug-ins and you would have get
>> new problems if a pipeline requires two different converters from the same
>> to the same data type because the pipeline cannot have automatically the
>> information which converter to use in which situation.
>>     
>
> I believe that these problems could be addressed by... compiler. In my opinion, pipelines
should be type-safe which
> basically means that for a given pipeline fragment you know what it expects on the input
and what kind of output it
> gives to you. The same goes for components. This eliminates "flexibility" of having a
component that accepts more than
> one kind of input or more than one kind of output. I believe that having more than one
output or one input only adds to
> complexity and does not solve any problem.
>
> If component was going to accept more than one kind of input how a user could know the
list of accepted inputs? I guess
> the only way to find out would be checking source and looking for all "instanceof" statements
in its code.
>   
The same way as in Cocoon 2.2, I guess.
Users have to know that a FileReader must not be followed by any 
component, that the Serializer must be the last component of the 
pipeline and the Generator the first component.
Currently users don't need to actually read the source code to find that 
out and I don't see why this would need to change.


Of course the user of a pipeline needs to know which components he uses 
and he needs to know which combinations of components actually make sense.
But I also do expect him to know what the components he selected do and 
whether they are compatible or not.
It's not like we're building SAX components that cannot be combined with 
each other or that some StAX components won't work with some other StAX 
component.

That image data represented as a bunch of bytes cannot be passed to a 
SAX transformer is something I expect from someone using Cocoon.
Just as I expect as certain knowledge of relation databases from someone 
using an O/R mapper.
> I would prefer situation when components have well-defined type of input and output and
if you one to combine components
> for which input-output pairs do not match you should add converters as intermediate components.
>
> I've been thinking about generic but at the same time type-safe pipelines for some time.
I've designed them on paper and
> everything looked quite promising. Then moved to implementation of my ideas and got rather
disappointing result which
> can be seen here:
> http://github.com/gkossakowski/cocoonpipelines/tree/master
>
> The most interesting files are:
> http://github.com/gkossakowski/cocoonpipelines/tree/master/src/org/apache/cocoon/pipeline/Pipeline.java
(generic and
> type-safe pipeline interface)
>
> http://github.com/gkossakowski/cocoonpipelines/tree/master/src/org/apache/cocoon/pipeline/PipelineComponent.java
> (generic and type-safe component def.)
>
> http://github.com/gkossakowski/cocoonpipelines/tree/master/src/org/apache/cocoon/pipeline/demo/RunPipeline.java
> (shows how to use that thing)
>   
The URLs above only return "Nothing to see here yet. Move along."...
Am I doing something wrong?
>   
>> The only thing cocoon can help here with is to provide as much "standard"
>> converters for use as possible, but it is still the responsibility of the
>> developer to use the right ones.
>>     
>
> I think Cocoon could define much better, type-safe Pipeline API but we are in unfortunate
situation that we are using
> language that makes it extremely hard to express this kind of generic solutions.
>
> Of course, I would like to be proven that I'm wrong and Java is powerful enough to let
us express our ideas and solve
> our problems. 
Actually I'm not sure which problems that are - as I'm sure we all have 
slightly different views on all this.
Some of the suggestions are actually hard for me to comprehend since I 
do not know which problem(s) they are trying to address.

I agree that we should try to avoid sources for mistakes as much as we can.
But trying to build a fail-proof API usually causes more harm than good IMO.

> Actually, the whole idea of pipeline is not a rocket science as it's, in essence, just
ordinary function
> composition. The only unique property of pipelines I can see is that we want to access
to _partial_ results of pipeline
> execution so we can make it streamable.
>   
What "_partial_ results" would you like to get from the pipeline?
And what for?
> This become more a brain-dump than a real answer to your e-mail Jakob, but I hope you
(and others) have got my point.
>
>   


Mime
View raw message