cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: AW: [PROPOSAL] Cocoon Science Fiction
Date Mon, 10 Feb 2003 17:18:23 GMT
Hochsteger Andreas /INFO-MA wrote:

>>While I understand your concept, I strongly disagree: SAX provides a 
>>multidimensional structured data space which is suitable for 
>>*any* kind 
>>of data structure.
> 
> 
> That's interesting.
> Do you mean namespaces by multidimensional structured data space?

yes

> But I doubt that placing binary or non-XML/SAX Text inside of structured
> XML-Tags will solve it all ;-)

of course. but again, I don't think cocoon should try to do everything.

>>True, maybe not as efficiently as other formats, but removing a fix 
>>contract between pipeline components will require a pluggable and 
>>metadata-driven parsing/serializatin stage between each component.
>>
>>I don't see any value of this compared to the current approach of SAX 
>>adaptation of external data to the internal model.
> 
> 
> Perhaps you misunderstand something here.
> I don't want to change the way, Cocoon handles SAX events right now.
> It's more about how we could handle non-SAX data streams a bit better.

cocoon does not handle non-SAX data streams (besides readers, but I
don't want to see them turned into pipelines).

>>>I'm sure some of you wanted to be able to build 
>>
>>applications the same way like 
>>
>>>Unix shell pipes work. Cocoon was a big step in this 
>>
>>direction, but it was 
>>
>>>only applicable for processing XML data. 
>>
>>*only XML* is misleading. *based on SAX* is the sentence. I've never 
>>perceived this as a limitation, but as a paradigm shift.
> 
> 
> Agreed.
> But the real world is not SAX-based and some better way to handle non-SAX
> data streams is demanded.

Great, but what does Cocoon have to do with this?

>>Topologically speaking, the solutions space is rotated, but 
>>it's size is 
>>not reduced.
>>
>>
>>>There are so many cases where 
>>>pipeline processing of data (no matter if it is XML, plain 
>>
>>text or binary 
>>
>>>data) is done today but we are lacking a generic and 
>>
>>declarative way to unify 
>>
>>>these processing steps. Cocoon is best suited for this task 
>>
>>through it's 
>>
>>>clean and easy to understand yet powerful pipeline concept.
>>
>>If you want to create pipelines for genereral data, why use 
>>Cocoon? just 
>>use the UNIX pipe or use servlet filters or apache 2.0 modules or any 
>>type of 'byte-oriented' (thus un-structured data) 
>>pipe&filters modules.
> 
> 
> This way I loose the great descriptive concept of Cocoon pipelines and the
> integration with it.

Integration with what? cocoon has components that are heavily xml
oriented. providing components for others types of data streams will not
make them interoperable since other data streams will have different
realms and different needs.

as far as descriptive concepts, nobody stops from using the same markup
we use in the sitemap to describe your other pipelines in another
framework targetted for other types of data.

> 
>>If you remove the structure from the pipeline data that flows, Cocoon 
>>will no be Cocoon anymore. This is not evolution, is extintion.
> 
> 
> Same misunderstanding as above.
> As I pointed out in "11 Converting old sitemaps to new sitemaps" the
> components dealing with "/text/xml" are not very different from those
> available today.
> I don't want to remove the structure from the data through the pipeline in
> any way.

Good.

>>>4 Pipeline Types
>>>================
>>>
>>>I tried to design several pipelines variants but after 
>>
>>thinking a while they 
>>
>>>all were still too limited for the way I wanted them to work.
>>>
>>>So here's another try by giving some hypotheses first:
>>>1. A pipeline can produce data
>>>2. A pipeline can consume data
>>>3. A pipeline can convert data
>>>4. A pipeline can filter data
>>>5. A pipeline can accept a certain data format as input
>>>6. A pipeline can produce a certain data format as output
>>>7. Pipeline components follow the same hypotheses (1-6)
>>>8. Only pipeline components with compatible data formats 
>>
>>can be arranged next 
>>
>>>to each other
>>
>>Ah, here you hint that you don't want to remove data 
>>structured-ness in 
>>the pipeline, just want to add *other* data structures 
>>besides SAX events.
> 
> 
> Yes, that's what I want to do...
> 
> 
>>Ok, this is worth investigating.
> 
> 
> [snip]
> 
> 
>>>5 Data Formats
>>>==============
>>>
>>>With "data format" I mean something like XML, plain text, 
>>
>>png, mp3, ...
>>
>>>I'm not yet really sure here, how we should specify data 
>>
>>formats, so I'll try 
>>
>>>to start with some requirements:
>>>1. They should be easy to remember and to specify ;-)
>>>2. It should be possible to create derived data formats (-> 
>>
>>inheritance)
>>
>>>3. It should be possible to specify additional information 
>>
>>(e.g. MIME type, 
>>
>>>DTD/Schema for XML, ...)
>>>4. Pipelines which accept a certain data format as input 
>>
>>can be fed with 
>>
>>>derived data formats
>>>5. We should not reinvent standards, which are already 
>>
>>suited for this task 
>>
>>>(but I fear, there does not yet exist something suitable)
>>
>>You are asking for a very abstract parsing grammar. Note, 
>>however, that 
>>is pretty easy to point to examples where these grammars will 
>>have to be 
>>so complex that maintaining them would be a nightmare.
> 
> 
> I don't think, that this grammar is very complex.
> See "5.1 Data Format Definition".
> It only consists of <data:format .../> with optional parameters.

that doesn't take into consideration the multidimensionality of the
content that cocoon is going to operate on.

> 
>>Think of a BNF-like grammar that is able to explain concepts like XML 
>>namespacing or HyTime Architectural Forms.
>>
>>
>>>To make it easier for us to begin with the task of defining 
>>
>>data formats, 
>>
>>>let's assume, we have three basic data formats called 
>>
>>"abstract", "binary" 
>>
>>>and "text". The format "abstract" will be explained later, 
>>
>>but "binary" and 
>>
>>>"text" should be clear to everyone.
>>
>>Binary and text are unstructured data streams. You are falling back.
> 
> 
> We don't fall back, since the structuredness is kept for XML.
> We only gain the additional possibility to process unstructured data
> streams.

No, in your architecture, there is no way to define that a pipeline
outputs formatting objects which contain SVG figures. This is a
drawback, unless you start providing a new datatype for all possible
combinations of namespaces (yuck!)

This is the reason why we do not describe pipelines with their
input/output properties in Cocoon. This was proposed a while ago and
turned down for that multi-dimensional problems.

>>>5.1 Data Format Definition
>>>--------------------------
>>>
> 
> 
> [snip]
> 
> 
>>>5.3 A word about MIME Types
>>>---------------------------
>>>
>>>If you ask me, why don't I use the standardized MIME types 
>>
>>(see [2]) to 
>>
>>>specify data formats, I can give you the following reasons:
>>>MIME types fulfill the requirements from above just partly. 
>>
>>They just support 
>>
>>>two levels of classification and they are purpose-oriented. 
>>
>>The data formats 
>>
>>>I suggest are therefore content-oriented (/text/xml/svg vs. 
>>
>>image/svg-xml). 
>>
>>>So both serve different purposes.
>>>
>>>I know the importance of supporting the MIME type standard, 
>>
>>and so the 
>>
>>>parameter 'mime-type' is part of the super data format 
>>
>>'any' and thus is 
>>
>>>available for every other data format too. By specifying a 
>>
>>certain data 
>>
>>>format, you always have a MIME type associated, in the 
>>
>>worst case the MIME 
>>
>>>type from the super data format 'any' 
>>
>>(application/octet-stream) is used.
>>
>> From what I see so far,  you are describing nothing 
>>different (from an 
>>architectural point of view) from what we already have.
> 
> 
> That's not what I wanted to do.
> 
> 
>>>5.4 Data Handlers
>>>-----------------
>>>
>>>I'm not very sure, what the data handlers actually do, but 
>>
>>I can think of 
>>
>>>either defining an interface, which must be implemented by 
>>
>>the pipeline 
>>
>>>components which operate with a certain data format (do we 
>>
>>need two handlers 
>>
>>>here: input-handler and output-handler?) or they are 
>>
>>concrete components 
>>
>>>which can be used by the pipeline components to consume or 
>>
>>produce this data 
>>
>>>format. I think some discussion on this topic might not be bad.
>>
>>Here you hit the nerve.
>>
>>If you plan on  having a different interface of data-handling 
>>for each 
>>data-type (or data-type family), the permutation of 
>>components will kill 
>>you.
> 
> 
> Yes, I was aware of this problem.
> That's why I'm very interested to hear your comments ;-)
> 
> But what I don't mean here is an interface for each data type.
> I rather mean to provide a reusable component which knows how to deal with a
> certain data format.
> This component can be used from other pipeline components.

This component has a name: parser. Then a parser has to come up with
something, and this something is normally an object model. Then you have
to adapt your object model to some contract that others components will
have to agree upon. Then you'll find out that this object model +
parsing + serialization stages are awefully slow and memory consuming.

> But I have not thought about it very much yet.

Sorry, but it shows :)

> 
>>>5.5 Data Format Determination
>>>-----------------------------
>>>
>>>In many cases, I've written the input- and output-format 
>>
>>along with the 
>>
>>>pipeline components, but it is also possible to specify them in the 
>>><map:components/> section or implicitely by implementing a 
>>
>>certain component 
>>
>>>interface and therefore omitting it in the pipeline.
>>>
>>>Here's a suggested order of data format determination:
>>>
>>>1. Input-/output-Format specified directly with a pipeline component
>>>	<map:produce type="uri" ref="docs/file.xml" 
>>
>>output-format="/text/xml"/>
>>
>>>2. Input-/output-Format specified by the component declaration
>>>	<map:filters>
>>>		<map:filter name="prettyxml" input-format="/text/xml" 
>>>output-format="/text/xml" ... />
>>>	</map:filters>
>>>3. Output-/input-Format specified by the previous or 
>>
>>following pipeline 
>>
>>>component
>>>	<map:produce type="uri" ref="docs/file.xhtml" 
>>>output-format="/text/xml/xhtml"/>
>>>	<!-- input- and output-format="/text/xml/xhtml" from 
>>
>>previous pipeline 
>>
>>>component -->
>>>	<map:filter type="prettyxml"/>
>>>4. Input-/output-Format specified directly with a pipeline
>>>	<map:pipeline input-format="/text/xml" 
>>
>>output-format="/text/xml">
>>
>>>		<map:filter type="prettyxml"/>
>>>		...
>>>	</map:pipeline>
>>>5. If nothing from above matches then assume "none".
>>
>>eheh, I wish it was that easy ;-)
>>
>>Suppose you have a component that operates on the svg: namespace of a 
>>SAX stream only, what is the input type?
>>
>>if data types are monodimensional, the above is feasible, but Cocoon 
>>pipelines are *already* multi-dimensional and the above can't 
>>possibly 
>>work (this has been discussed extensively before for pipeline 
>>validation)
> 
> 
> You got me!
> This is something I didn't think about currently.
> Perhaps using only "/text/xml" for such cases, without dealing with derived
> XML data formats solves it?

No, you are back with no information on the type rather than "this is
xml", which doesn't mean anything and doesn't contain enough information
to understand how to compose pipelines.

> 
>>>6 Pipeline Components
>>>=====================
>>
>>[snip]
>>
>>Assuming you have several structured pipelines:
>>
>>  - SAX -> all xml/sgml content
>>  - output/input streams -> unstructured text/binary
>>  - OLE -> all OLE-based files (word, excel, blah blah)
>>  - MPEG -> all MPEG-based framed multimedia (MPEG1/2, mp3)
>>
>>why would you want to mix them into the same system?
>>
>>I mean, if you want to apply structured-pipeline 
>>architectures to, say, 
>>audio editing, you are welcome to do so, but why in hell 
>>should Cocoon 
>>have to deal with this?
> 
> 
> Because ...
> * it provides a good framework for this tasks

this tasks? what? generation of 3d rendering on the server? there are
much better frameworks to do 3d rendering, video/audio editing, or for
calling unix pipeline command line things.

> * more and more data processing is done in XML (even publishing, 3D, music,
> ...)

so why do you need other data pipeline?

> * it is neccessary to integrate both for migration from legacy data formats
> to XML

we are already doing this thru adaptation of non-xml data formats to SAX
events and back.

>>You are very close to win the prize for the FS-award of the year :)
> 
> 
> Oh, what a privilege ;-)
> 
> 
>>It *would* make sense to add these complexities only if processing 
>>performed in different realms could be interoperated. But I 
>>can't see how.
>>
>>what does it mean to perform xstl-transformation on a video stream?
>>
>>what does it mean to perform audio mixing on an email?
> 
> 
> The 'misuse' you scetched, will be detected through the use of data formats:
> * An XSLT-Transformer will only operate on "/text/xml"
> * An Audio-Mixer will only operate on "/abstract/sound"

Bingo. So why should they live in the same project?

> 
>>It would not make any sense to add functionalities inside 
>>cocoon that do 
>>not belong in the real of its problem space. It would only dilute the 
>>effort in the additional complexity only for sake of flexibility.
> 
> 
> Cocoon is already used for data integration in may areas.

Integration means 'adaptation'. You are describing pipelines that *DO*
*NOT* collaborate, just share the same environment and description markup.

> The possibilities of data itegration should not stop with the Reader
> component

Readers are suppose to *read*. Period. They do not do data integration
at any stage.

> and converting every legacy data format to XML before processing
> it is not always possible.

Right. So, if it's not possible, Cocoon is not the right tool for you.
Easy enough.

> [snip]
> 
> 
>>>7.1 Web Services
>>>----------------
>>>
>>>As many of you know there are existing two popular styles 
>>
>>to use Web Services: 
>>
>>>SOAP and REST.
>>>Both have their own advantages and disadvantages but I'd 
>>
>>like to concentrate 
>>
>>>on SOAP and on it's transport protocol independence, 
>>
>>because REST-style Web 
>>
>>>Services are already possible to do with Cocoon.
>>>
>>>SOAP allows us to use any transport protocol to deliver 
>>
>>SOAP messages. Mostly 
>>
>>>HTTP(S) is used therefore, but there are many cases, where 
>>
>>you have to use 
>>
>>>other protocols (like SMTP, FTP, ...).
>>>Whatever protocol you chose to invoke your Web Services the 
>>
>>result should be 
>>
>>>always the same and the response should be delivered back 
>>
>>through (mostly) 
>>
>>>the same protocol. Here is one of the greatest advantages 
>>
>>of the protocol 
>>
>>>independance.
>>
>>No, this is not protocol independence. This is transport 
>>independance, 
>>you are still dependent on SOAP as a protocol.
> 
> 
> What I meant was 'transport protocol independence'.
> 
> [snip]
> 
> 
>>>8 Protocol Handler
>>>==================
>>
>>I don't think Cocoon should implement protocol handlers. Cocoon is a 
>>data producer, should not deal with transport.
> 
> 
> I agree, that it is not the task of cocoon to deal with transporting.
> But Cocoon does this already to a certain degree with the HTTP protocol
> (headers!) and is therefore bound to the HTTP protocol.

SMTP has headers.

> You can't easily serialize an SVG to a jpeg and deliver it via eMail.

We are already working on extending the Environment to do that.

> So if I want to be able to deliver the output of a pipeline via different
> transport channels I have to break up this tight binding to HTTP.

How familiar are you with the Cocoon Environment classes?

> 
>>We already have enough problems to try to come up with an Enviornment 
>>that could work with both email and web (which have orthogonal 
>>client/server paradigms), I don't want to further increase the 
>>complexity down this road.
> 
> 
> I know that this means additional complexity, but currently this complexity
> is already hidden in other components (Reader, Serializer) and therefore
> mixed with different concerns.

Serializers are adapters from SAX to the outside world of data formats.
Readers read.

They have different concerns and they are very well separated. Where is
the mix?

> Why should an SVG2JPEG Serializer have to deal with HTTP headers?

The Serializers has to deal with Environment headers. How these headers
are translated depends on the Environment implementation which,
currently, is either web or command line and in the future will be mail.

> I think seperation of concerns is not the case here.

I can't see how your proposed architecture can improve the use of
headers if not thru an adaptation system that would be comparable of
what we are using for the Environment.

[snip]

Anyway, I think you are doing the most common mistake of software
architects: software design by symmetry instead of following real-user
requirements. This is a *BIG* and dangerous anti-pattern that normally
kills software project and bloats them into gigantic messes.

Cocoon has been evolving thru progressive refinement of real-world
requirements. I can't see any in your outline.

No, web services don't suffice, I still have to see a real use of them
and Microsoft is pushing SOAP exactly because it's bloated and
paper-driven (something that they know how to politically control,
unlike HTTP and SMTP)

Sorry if I sound negative, but the impact of the architectural changes
you propose would be terrible to our user base and I don't want it to
see it happening.

-- 
Stefano Mazzocchi                               <stefano@apache.org>
    Pluralitas non est ponenda sine necessitate [William of Ockham]
--------------------------------------------------------------------




---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message