cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan <alan-coc...@engrm.com>
Subject Re: [RT] Cocoon Input Model
Date Fri, 27 Feb 2004 21:00:35 GMT
* Daniel Fagerstrom <danielf@nada.kth.se> [2004-02-25 15:49]:

> Why Cocoon rocks for publishing
> -------------------------------
> 
> Cocoon is based on three great ideas: XML-adaptors, XML-pipelines and 
> the sitemap. Here we will discuss the first two.
> 
> If you have N different input formats and M output formats you need N*M 
> converers for converting from every input format to every output format. 
> This complexity can be reduced to N+M by finding a standard format...

> Having a common format (XML) also makes it worthwhile to write tools 
> that use that format booth as input and output (e.g. XSLT), and we can 
> use the pipes and filter pattern to build complex transformations in 
> terms of smaller specialized, reusable filters.
> 
> 
> Dataflow in (web)apps
> ---------------------

> and for (web)apps:

> [Input format (user) -> Output format (storage)] -> webapp -> [Input 
> format (storage) -> Output format]

This is how I've built by LAMP applications. The first thing is to develop a
    database. Then everything goes into the database before it comes back out.
    Even if the application only keeps session data, I build a database. It is
    a matter of course.

    What other ways are there of handing data? Does anyone keep things in
    memory, for simply regurgiate the input in their applications?

    If so, then there are two pipeline designs, a input / output pipeline pair
    and a pass-through pipeline. 
    
> As we can see publishing has one conversion step and (web)apps has two. 
> In [1] I talked about input and output pipelines for the two conversion 
> steps.

I'd like to expand on this, currently Cocoon treats storage as a filter.
    Things like the SQLTransformer filter streams to store data then onward to
    a serializer.

    What you are proposing is a pipeline that that terminates not at a
    serializer, but at something else, that somehow stores the XML. Then it
    kicks off a new pipeline that terminates at a serializer.

To my mind, rather than have this parameter things in the site map, I'd much
    rather have everything kept as XML.

    Session information, for exmaple. I am going to use Momento to keep a
    session document, and then skip that name/value pair nonsense. 
    
    Somewhere in my CForms pipelines, I transform input into an XUpdate
    statement and build a sub document in a Momento document. Then I can
    aggregate or cinclude that session document.

> Comparing input and output pipelines, the input handling have one main 
> source of extra complexity: we cannot trust user input. We need to check 
> that the input is correct and take different action dependent on that, 
> so as a consequence control structure becomes more complicated when we 
> have user input.

> A further reason for detailed control of user input is that while the output
> tend go from strongly typed data (db:s, Java etc) to loosely typed data; in
> presentation most things are strings. Input tend to have the opposite
> requirement, from strings to typed data.

Okay, here is the strongly typed part of it, my apologies, I understand now.

Strongly typed data, but first...

    Your solution is nice, except that it your N+M is missing something now.
    There are N different input formats, M different output formats, and of
    course, S different storage formats.

    Consider a e-mail account user registration form, first page they tell us
    who they are and choose a password. Second page, we ask them to choose
    which junk newsletters they want to recieve.

    When the information arrives and becomes XML. Now maybe I want to put the
    XML in three different storage areas. Say I want to store the username and
    password in an LDAP directory, the user's profile and such in an
    relational database, and the fact that the user is now on the second page
    of the registration wizzard as session information.

I think it is easy enough to validate and construct strongly typed data once
    the input is an XML format. You can use XML Schema, Relax NG, and such to
    validate information in the pipeline, then transform it to XUpdate or
    ebXML, or SQL statements, to feed to an XML consumer.

    For form input, CForms provides validation of form entries in a way that is
    interactive and assoicates mistakes with the source widget in the
    interface. If you were to offer a web service however, you would have to
    have a way to validate XML that would return an error document of a
    different nature, thus Relax NG, Schematron, etc. 

    (You go on to say this yourself. Good. I'll snip it but I agree.)


> Is Cocoon that great for input handling?
> ----------------------------------------
> 
> We see that the situation for input handling have become quite similar 
> to that for output: many input formats and many output formats. But in 
> contrast to the output scenario we have no common design patterns for 
> handling the complexity.

And this makes it very difficult for new users like myself. New users seem to
    get the pipeline concepts quickly, and then stumble on the various input
    concepts. Such has been the case for me. I've been very creative in the
    page genration part of my web site (http://engrm.com), using fop, mutiple
    transforms, cinclude, aggregation. I still have only written one example
    CForm application, however.

    If anything pipelines in will  be easy to teach once people understand
    pipelines out.

> In some cases we have components that converts 
> directly from input format to storage format. In other cases we use a 
> format between input and storage, but this format can be a hashmap, java 
> beans, the Woody widget hierarchy or XML in form of DOM or SAX. In some 
> of the cases we also have validation mechanisms for the middle format.
> 
> This lack of a common accepted pattern for input handling leads to: less 
> reuse, multiple components that does similar things and a lack of a 
> common focal point. An example of this is the discussion about 
> Cocoon/relational database coupling: we have multiple ways to go from 
> RDBs to XML, but no components for the opposite direction...

I better jump into this discussion then. I've considered a language that would
    express a database document as xml, and a tool that would compare that
    document to the database only updating what is necessary.

    <xd:document xmlns="http://engrm.com/schema/2004/02/rosetta">
      <xd:record table="employee">
        <xd:column name="employee-id" key="primary">007</xd:column>
        <xd:column name="first-name">James</xd:column>
        <xd:column name="last-name">Bond</xd:column>
        <xd:record table="employee-department">
          <xd:record table="department">
            <xd:column name="department-id">mi6</xd:column>
            <xd:column name="department-name"
                       >Secret Intelligence Service</xd:column>
          </xd:record>
        </xd:record>
      </xd:record>
    </xd:document>

    The above document would have all the information necessary to update a
    database where:

    empoyee           employee-department       
    -----------       -------------------       department
    employee-id <---> employee-id               ---------------
    first-name        department-id       <---> department-id
    last-name                                   department-name

If it isn't enough, I suppose you add some form of functional programming or
    direct execution of SQL statements.

> The solution ;)
> ---------------
> 
> IMO we have an obvious solution to this situation rigth before our eyes: 
> adapt the patterns that we allready use for output handling, i.e. 
> adaptors and pipelines, to input handling as well. To do this we must 
> decide about a common format. The candidates are: hashmaps, Java beans, 
> Woody widget hierarchy and XML.

I vote for XML. At this point, a person can use Cocoon as a publishing
    platform without adding Java. Please keep it that way.

    Cocoon output is like a delta. Cocoon input should be like a funnel.

    Rather than runing towards a serializer from a generator, input should run
    from a deserializer to a consumer.

    For my purposes, I'd like to have all input filter into an XML Cocoon
    pipeline that funnels everything into a transform that produces XUpdate
    fit for consumption by Momento.

> I think that using XML has _huge_ advantages:
> 
> * Cocoon is an XML based framework and use XML as internal format 
> allmost everywhere. When one use the Woody widget hierarchy one have to 
> translate back and forth between XML and Woody all the time which as 
> least IMO is a waist of time.
> 
> * XML is standardized, and there are an enormous amount of tools that 
> use it. For Woody widgets, we have to do everything ourselves.
> 
> * There are well designed schemas for XML: XML Schema, and if you don't 
> like that: Relax-NG. As the rest of the XML world use XML data types we 
> get an impedance mismatch between the Woody data types and XML.

Yes. Yes. Yes.

> What does this mean in practice?
> --------------------------------
> 
> This far I have, (fairly strongly I supose ;) ), sugested that we should 
> use XML as the standardized internal format for all input handling in 
> Cocoon...

> Untyped XML is not enough, so we also need typed XML. Here I consider a 
> DOM with a schema atached to it, so that one can [re]validate the DOM, 
> ask the nodes and the leaves if they are valid and what datatype they 
> have and also access valid leaves in terms of the corresponding Java 
> data type. I think something like this should be possible to build by 
> combining a DOM implementation, e.g. Xerces, with Sun Multi Schema 
> Validator (MSV) and XSDLIB [2].

> To make DOM easy to use within flowscripts it would be nice to write 
> Rhino binding code (scriptable object) so that one can use the Ecma 
> script API for DOM. It is also a good idea to use a DOM implementation 
> that implements DOM events, so that one can write flowscript code in the 
> same style as client side JS.

I would like to understand what this DOM versus SAX stuff is about. Do you
    want to input data base editing DOM? I'd much rather express an update
    statement in XUpdate. 

-- 
Alan / alan@engrm.com / http://engrm.com/
    aim/yim: alanengrm - icq: 228631855 - msn: alanengrm@hotmail.com

Mime
View raw message