cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Fagerstrom <dani...@nada.kth.se>
Subject [RT] Cocoon Input Model
Date Wed, 25 Feb 2004 15:49:41 GMT
The current discussion about Cocoon database connection and some 
frustration about the complexity in connecting Woody based webapps to 
XML makes me think that it is time to take a new discussion about the 
principles for Cocoon input handling (see [1] for an earlier discussion).

But before discussing input handling I would like to remind a litle bit 
about why Cocoon is so great for output handling (publishing).


Why Cocoon rocks for publishing
-------------------------------

Cocoon is based on three great ideas: XML-adaptors, XML-pipelines and 
the sitemap. Here we will discuss the first two.

If you have N different input formats and M output formats you need N*M 
converers for converting from every input format to every output format. 
This complexity can be reduced to N+M by finding a standard format (e.g. 
XML) and perform the convertion in two steps: first from input to the 
standard format and in a second step from the standard format to the 
output format. In Cocoon generators take care of tthe first step and 
serializers about the second step. If we only have a few input and 
output formats the extra complexity of the two step process is probably 
not worthwhile. But as we add formats it becomes increasily painfull to 
add N convertors for each new output format and M new convertors for 
each new input format.

Having a common format (XML) also makes it worthwhile to write tools 
that use that format booth as input and output (e.g. XSLT), and we can 
use the pipes and filter pattern to build complex transformations in 
terms of smaller specialized, reusable filters.


Dataflow in (web)apps
---------------------

Now, what about (web)-applications in Cocoon? Here the general pattern 
is: get input on some format from the user and store it in some format 
that the business logic can use. When the business logic has done its 
things, the ordinary (web)-publishing mechanism, i.e. a pipeline, can be 
used for showing the result.

So looking on the data flow a (simplified) view on publishing is:

[Input format -> Output format]

and for (web)apps:

[Input format (user) -> Output format (storage)] -> webapp -> [Input 
format (storage) -> Output format]

As we can see publishing has one conversion step and (web)apps has two. 
In [1] I talked about input and output pipelines for the two conversion 
steps.

Comparing input and output pipelines, the input handling have one main 
source of extra complexity: we cannot trust user input. We need to check 
that the input is correct and take different action dependent on that, 
so as a consequence control structure becomes more complicated when we 
have user input. A further reason for detailed control of user input is 
that while the output tend go from strongly typed data (db:s, Java etc) 
to loosely typed data; in presentation most things are strings. Input 
tend to have the opposite requirement, from strings to typed data.


Is Cocoon that great for input handling?
----------------------------------------

We can see that for input we need three things: more sofisticated 
control - this is solved with flowscripts. A mechanism for describing 
and validating the form of the input data. And possibly a mechanism to 
add type information to input data.

How is this handled in Cocoon?

In the begining there where only one input format: request parameters, 
i.e. a hashmap. It can be checked by a FormValidatorAction and stored in 
a db by a [Modular]DatabaseAction or used in Java code by writting a 
specialized action.

Now, unordered and unstructured input data like a hashmap, is not enough 
for more advanced user interfaces. [XML|JX]Forms and later Woody 
intoduced going from path like request parameter names to data 
structues. In [XML|JX]Forms by writing/reading directly to DOM or Java 
bean structures and in Woody by introduce a form model: the widget 
hierarchy.

In Woody the structure and data types, among other things, in the widget 
hierarchy is defined in a widget definition file. Woody also contains a 
validation mechanism working on the widget hierarchy and bidirectional 
conversion between the widget hierarchy and Java datastructures and DOM 
respectively.

Besides using request parameters and "structured" request parameters as 
user input. XML is used for WebDAV and web service applications, XML are 
also becoming more common from more advanced user clients. And with new 
environments like mail, CLI, JMS and possibly more, we will get even 
more user input formats. As storage formats we have various database 
types, file system, DOM etc.

We see that the situation for input handling have become quite similar 
to that for output: many input formats and many output formats. But in 
contrast to the output scenario we have no common design patterns for 
handling the complexity. In some cases we have components that converts 
directly from input format to storage format. In other cases we use a 
format between input and storage, but this format can be a hashmap, java 
beans, the Woody widget hierarchy or XML in form of DOM or SAX. In some 
of the cases we also have validation mechanisms for the middle format.

This lack of a common accepted pattern for input handling leads to: less 
reuse, multiple components that does similar things and a lack of a 
common focal point. An example of this is the discussion about 
Cocoon/relational database coupling: we have multiple ways to go from 
RDBs to XML, but no components for the opposite direction, we have 
actions that go in booth directions between hashmaps and RDBs and for 
going in booth directions between Java datastructures and RDBs.


The solution ;)
---------------

IMO we have an obvious solution to this situation rigth before our eyes: 
adapt the patterns that we allready use for output handling, i.e. 
adaptors and pipelines, to input handling as well. To do this we must 
decide about a common format. The candidates are: hashmaps, Java beans, 
Woody widget hierarchy and XML.

We have allready an action based framework for using hashmaps but it is 
questinable if unstructured data is enough for more advanced 
aplications. Java beans requires IMO to much work and it would also 
require all Cocoon users to be Java programers. The remaining 
candidates; Woody widget hierarchies and XML have a lot in common. Both 
are hierarchial data structures. Both contains (or can contain) typed 
data, (an XML document togeher with a schema is a typed datastructure).

While the Woody widget structure has some things in its advantage: we 
allready have working validation in it and easy connection to Java data 
types, I think that using XML has _huge_ advantages:

* Cocoon is an XML based framework and use XML as internal format 
allmost everywhere. When one use the Woody widget hierarchy one have to 
translate back and forth between XML and Woody all the time which as 
least IMO is a waist of time.

* XML is standardized, and there are an enormous amount of tools that 
use it. For Woody widgets, we have to do everything ourselves.

* There are well designed schemas for XML: XML Schema, and if you don't 
like that: Relax-NG. As the rest of the XML world use XML data types we 
get an impedance mismatch between the Woody data types and XML.


What does this mean in practice?
--------------------------------

This far I have, (fairly strongly I supose ;) ), sugested that we should 
use XML as the standardized internal format for all input handling in 
Cocoon, so that we can use the adaptor and pipes and filter patterns for 
input as well as ouput. What does this means in practice?

To a part we allready have the mechanisms, e.g. one can use a pipeline 
that process the input from a processPipelineTo[DOM|SAX|Stream] within 
flowscripts. The pipeline input can be from request params or the 
inputStream (in the servlet enviroment) from "module:inputStream" and 
adapt it to XML in any generator.

But in many cases using SAX based XML as in pipelines is not enough we 
need a data structure i.e. DOM. This leads to flowscript components that 
reads some input format to DOM and from DOM to some output format or 
some store. We also will need flowscript components that go from DOM to DOM.

Untyped XML is not enough, so we also need typed XML. Here I consider a 
DOM with a schema atached to it, so that one can [re]validate the DOM, 
ask the nodes and the leaves if they are valid and what datatype they 
have and also access valid leaves in terms of the corresponding Java 
data type. I think something like this should be possible to build by 
combining a DOM implementation, e.g. Xerces, with Sun Multi Schema 
Validator (MSV) and XSDLIB [2].

CForms should IMO use the above described typed DOM as form model 
instead of the current propitary Java structure.

To make DOM easy to use within flowscripts it would be nice to write 
Rhino binding code (scriptable object) so that one can use the Ecma 
script API for DOM. It is also a good idea to use a DOM implementation 
that implements DOM events, so that one can write flowscript code in the 
same style as client side JS.

                                    --- o0o ---

To sumarize: I think that we could make Cocoon considerably easier to 
use for (web)apps and increase reuse of components by using the 
XML-adaptor and pipes and filter pattern for input as well.

WDYT?

/Daniel

References
----------

[1] [RT] Input Pipelines (long)
http://marc.theaimsgroup.com/?t=104008605100003&r=1&w=2

[2] MSV
https://msv.dev.java.net/



Mime
View raw message