From Daniel Fagerstrom <>
Subject Re: [RT] Cocoon Input Model
Date Fri, 05 Mar 2004 18:02:54 GMT
Alan wrote:

> * Daniel Fagerstrom <> [2004-02-25 15:49]:
>>Why Cocoon rocks for publishing
>>Cocoon is based on three great ideas: XML-adaptors, XML-pipelines and 
>>the sitemap. Here we will discuss the first two.
>>If you have N different input formats and M output formats you need N*M 
>>converers for converting from every input format to every output format. 
>>This complexity can be reduced to N+M by finding a standard format...
>>Having a common format (XML) also makes it worthwhile to write tools 
>>that use that format booth as input and output (e.g. XSLT), and we can 
>>use the pipes and filter pattern to build complex transformations in 
>>terms of smaller specialized, reusable filters.
>>Dataflow in (web)apps
>>and for (web)apps:
>>[Input format (user) -> Output format (storage)] -> webapp -> [Input 
>>format (storage) -> Output format]
> This is how I've built by LAMP applications. The first thing is to develop a
>     database. Then everything goes into the database before it comes back out.
>     Even if the application only keeps session data, I build a database. It is
>     a matter of course.
>     What other ways are there of handing data? Does anyone keep things in
>     memory, for simply regurgiate the input in their applications?
>     If so, then there are two pipeline designs, a input / output pipeline pair
>     and a pass-through pipeline. 
>>As we can see publishing has one conversion step and (web)apps has two. 
>>In [1] I talked about input and output pipelines for the two conversion 
> I'd like to expand on this, currently Cocoon treats storage as a filter.
>     Things like the SQLTransformer filter streams to store data then onward to
>     a serializer.
>     What you are proposing is a pipeline that that terminates not at a
>     serializer, but at something else, that somehow stores the XML. Then it
>     kicks off a new pipeline that terminates at a serializer.

Yes, that can either be done in flowscript by something like 
processPipelineTo[...], something like 
processPipelineToModifyableSource(pipe, source, args). ANother 
possibility is a new store sitemap component, but as you know by now, 
new sitemap components is a quite controversial subject ;)

> To my mind, rather than have this parameter things in the site map, I'd much
>     rather have everything kept as XML.
>     Session information, for exmaple. I am going to use Momento to keep a
>     session document, and then skip that name/value pair nonsense. 

That's the idea, session document sounds like a good name.

>     Somewhere in my CForms pipelines, I transform input into an XUpdate
>     statement and build a sub document in a Momento document. Then I can
>     aggregate or cinclude that session document.
>>Comparing input and output pipelines, the input handling have one main 
>>source of extra complexity: we cannot trust user input. We need to check 
>>that the input is correct and take different action dependent on that, 
>>so as a consequence control structure becomes more complicated when we 
>>have user input.
>>A further reason for detailed control of user input is that while the output
>>tend go from strongly typed data (db:s, Java etc) to loosely typed data; in
>>presentation most things are strings. Input tend to have the opposite
>>requirement, from strings to typed data.
> Okay, here is the strongly typed part of it, my apologies, I understand now.
> Strongly typed data, but first...
>     Your solution is nice, except that it your N+M is missing something now.
>     There are N different input formats, M different output formats, and of
>     course, S different storage formats.

The general idea is that sources and modifiable sources describe places. 
A generator reads a certain format from a source (place) and converts it 
to XML. A serializer converts from XML to a specified format that in 
turn could be fed into a modifiable source (a place). So the output and 
the storage format are the same, but we have I+M+N+S, where I, S are the 
number of input sources and outpuĀ“t source respectively, instead of 
I*M*N*S if we where to write components that go from input source to 
output source in one step.

>     Consider a e-mail account user registration form, first page they tell us
>     who they are and choose a password. Second page, we ask them to choose
>     which junk newsletters they want to recieve.
>     When the information arrives and becomes XML. Now maybe I want to put the
>     XML in three different storage areas. Say I want to store the username and
>     password in an LDAP directory, the user's profile and such in an
>     relational database, and the fact that the user is now on the second page
>     of the registration wizzard as session information.
> I think it is easy enough to validate and construct strongly typed data once
>     the input is an XML format. You can use XML Schema, Relax NG, and such to
>     validate information in the pipeline, then transform it to XUpdate or
>     ebXML, or SQL statements, to feed to an XML consumer.
>     For form input, CForms provides validation of form entries in a way that is
>     interactive and assoicates mistakes with the source widget in the
>     interface. If you were to offer a web service however, you would have to
>     have a way to validate XML that would return an error document of a
>     different nature, thus Relax NG, Schematron, etc. 
>     (You go on to say this yourself. Good. I'll snip it but I agree.)
>>Is Cocoon that great for input handling?
>>We see that the situation for input handling have become quite similar 
>>to that for output: many input formats and many output formats. But in 
>>contrast to the output scenario we have no common design patterns for 
>>handling the complexity.
> And this makes it very difficult for new users like myself. New users seem to
>     get the pipeline concepts quickly, and then stumble on the various input
>     concepts. Such has been the case for me. I've been very creative in the
>     page genration part of my web site (, using fop, mutiple
>     transforms, cinclude, aggregation. I still have only written one example
>     CForm application, however.
>     If anything pipelines in will  be easy to teach once people understand
>     pipelines out.

Absolutely, we need a common design pattern for how to build Cocoon 
applications. That will make it easier for new Cocoon users and it will 
lead to more reusable components for webapps.

>>In some cases we have components that converts 
>>directly from input format to storage format. In other cases we use a 
>>format between input and storage, but this format can be a hashmap, java 
>>beans, the Woody widget hierarchy or XML in form of DOM or SAX. In some 
>>of the cases we also have validation mechanisms for the middle format.
>>This lack of a common accepted pattern for input handling leads to: less 
>>reuse, multiple components that does similar things and a lack of a 
>>common focal point. An example of this is the discussion about 
>>Cocoon/relational database coupling: we have multiple ways to go from 
>>RDBs to XML, but no components for the opposite direction...
> I better jump into this discussion then. I've considered a language that would
>     express a database document as xml, and a tool that would compare that
>     document to the database only updating what is necessary.
>     <xd:document xmlns="">
>       <xd:record table="employee">
>         <xd:column name="employee-id" key="primary">007</xd:column>
>         <xd:column name="first-name">James</xd:column>
>         <xd:column name="last-name">Bond</xd:column>
>         <xd:record table="employee-department">
>           <xd:record table="department">
>             <xd:column name="department-id">mi6</xd:column>
>             <xd:column name="department-name"
>                        >Secret Intelligence Service</xd:column>
>           </xd:record>
>         </xd:record>
>       </xd:record>
>     </xd:document>
>     The above document would have all the information necessary to update a
>     database where:
>     empoyee           employee-department       
>     -----------       -------------------       department
>     employee-id <---> employee-id               ---------------
>     first-name        department-id       <---> department-id
>     last-name                                   department-name
> If it isn't enough, I suppose you add some form of functional programming or
>     direct execution of SQL statements.

I sugest something like that in, it is based on 

>>The solution ;)
>>IMO we have an obvious solution to this situation rigth before our eyes: 
>>adapt the patterns that we allready use for output handling, i.e. 
>>adaptors and pipelines, to input handling as well. To do this we must 
>>decide about a common format. The candidates are: hashmaps, Java beans, 
>>Woody widget hierarchy and XML.
> I vote for XML. At this point, a person can use Cocoon as a publishing
>     platform without adding Java. Please keep it that way.
>     Cocoon output is like a delta. Cocoon input should be like a funnel.
>     Rather than runing towards a serializer from a generator, input should run
>     from a deserializer to a consumer.
>     For my purposes, I'd like to have all input filter into an XML Cocoon
>     pipeline that funnels everything into a transform that produces XUpdate
>     fit for consumption by Momento.
>>I think that using XML has _huge_ advantages:
>>* Cocoon is an XML based framework and use XML as internal format 
>>allmost everywhere. When one use the Woody widget hierarchy one have to 
>>translate back and forth between XML and Woody all the time which as 
>>least IMO is a waist of time.
>>* XML is standardized, and there are an enormous amount of tools that 
>>use it. For Woody widgets, we have to do everything ourselves.
>>* There are well designed schemas for XML: XML Schema, and if you don't 
>>like that: Relax-NG. As the rest of the XML world use XML data types we 
>>get an impedance mismatch between the Woody data types and XML.
> Yes. Yes. Yes.
>>What does this mean in practice?
>>This far I have, (fairly strongly I supose ;) ), sugested that we should 
>>use XML as the standardized internal format for all input handling in 
>>Untyped XML is not enough, so we also need typed XML. Here I consider a 
>>DOM with a schema atached to it, so that one can [re]validate the DOM, 
>>ask the nodes and the leaves if they are valid and what datatype they 
>>have and also access valid leaves in terms of the corresponding Java 
>>data type. I think something like this should be possible to build by 
>>combining a DOM implementation, e.g. Xerces, with Sun Multi Schema 
>>Validator (MSV) and XSDLIB [2].
>>To make DOM easy to use within flowscripts it would be nice to write 
>>Rhino binding code (scriptable object) so that one can use the Ecma 
>>script API for DOM. It is also a good idea to use a DOM implementation 
>>that implements DOM events, so that one can write flowscript code in the 
>>same style as client side JS.
> I would like to understand what this DOM versus SAX stuff is about. Do you
>     want to input data base editing DOM? I'd much rather express an update
>     statement in XUpdate. 

No I don't want to edit a DB other the DOM api. For webapps you 
typically have to allow the "session document" to be partly invalid and 
incomplete. For more fine grained access to the session store under a 
user interaction I think DOM is good, with a ECMA interface to the DOM 
tree it would also fit quite well in flowscripts. For more transactional 
stuff DOM write is of course bad.


