cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Irv Salisbury <irv.salisb...@gmail.com>
Subject Re: [RT][long] Cocoon 3.0: the necessary mutation
Date Fri, 02 Dec 2005 21:58:25 GMT
I realize I am not a committer or anything, but hopefully that doesn't
prevent me from replying...

We have built a number of large cocoon based applications over the
past 2.5years.  Everything you said in here is great, and I would love
it if cocoon
had it.  I won't even comment on what you have written as I agreed with all
of it.

I would like to add a few more comments

1. CForms - Our recent app has about 175 CForms.  Each of the binding,
definition, and template files were cocoon: URLs.  For doing simple forms,
having to write 3 files was a real pain and very time consuming.  So, we
started building our own meta file that contained enough information to
generate all 3 files necessary on the fly.  I am sure others have done so as
well.  It seems like for the simple cases (or even the medium cases) we
should be able to get away with 1 or even 2 files that should have to be
written.  CForms is great, and making this easier to work with would be
great.

2. Memory usage.  Now, maybe Stax helps with this, but I don't think so.
With our latest app, which was pretty large, we had pipelines with many many
steps in them.  Often, these steps would be simple things that just change 1
or 2 elements in the XSL stream.  When you are building a large cocoon app,
you often needs lots of data flowing through the system.  Each step in the
pipeline generates sax events and memory garbage, even if you don't need
it.  To get around this, we wound up using XMLBeans and having Java objects
"flowing" through the system  that can be turned into XML on the fly.  When
you have a large cocoon app with lots of users, there is a lot more memory
usage than other frameworks.

3. More integration with XMLBeans (or JAXB).  Going back and forth from
JavaFlow to XML has benefits.  (I agree with your earlier point that we
shouldn't have to start with pipelines).  XMLBeans (or JAXB) makes this a
snap.  It has the speed of SAX and the benefit that you can go back and
forth between XML and Java easily.  We made a lot of use of this in our
recent app.  It worked great.  Downside is that you have to write an XML
Schema for all XML interactions.  (Which is also a good thing as it is then
documented)  I realize this wouldn't help the learning curve at all.

4. Java flow (or flowscript) as the ONLY pipeline language.  Now, this may
be too touchy, but if all requests entered into Javaflow or Flowscript, and
then this was used as the "orchestration" language, I think that would be
great.  Make it so you can easily aggregate pipelines, do content based
routing, etc.  Pipelines then can be simpler and all logic is done in
flowscript or Javaflow.  You should be able to setup all your components in
flow.  Having this in Java or Javascript would make conditional processing
much easier and removing the pipeline (as sacred as it is) would help us
only have 1 language.

5. Continuations.  More needs to be understood about these.  We found in our
app that we spent a lot of time debugging how and when these are created,
why they are created, and getting rid of them.  It seems like for a simple
Javaflow call, should we really create a continuation?  If you don't ever do
a sendPageAndWait, and simply use it as processing, why bother with the
continuation?  It is just something else that creates memory usage and
something we have to worry about destroying.

Cocoon is awesome and with AJAX and other XML technologies becoming more
popular, it is slated to be THE framework to choose.  I think you are very
correct in that it won't be the way it is currently written.

Thanks for listening.

Irv

On 12/2/05, Sylvain Wallez <sylvain@apache.org> wrote:
>
> Hi all,
>
> For many years, I have been more than happy with Cocoon, enjoying the
> power and ease it brought for both publication and webapp projects. Over
> the last months however, other feelings have emerged: there are things
> that are definitely overly complex in Cocoon, and there have been some
> emerging frameworks leading to "wow, cool!" reactions rather than "yeah,
> yet another one". Also, I strongly believe that the Ajax revolution is
> quickly obsoleting the traditional reload-page-on-user-action model that
> prevailed on the web up to recently and requires frameworks that help
> building these new kinds of applications.
>
> All this to say that if we want Cocoon to have a bright future, it must
> go through a mutation. This RT dumps my current ideas about this,
> depicting something that IMO better fits today's needs while taking into
> account many years of innovation and what we learned from them.
>
>                        -- oOo --
>
> First of all, let's consider the place where Cocoon fits in the large
> family of webapp development frameworks.
>
> On one end, we have pure-J2EE things like Struts or JSF. They lead to
> writing a lot of Java classes, lots of XML config files, and try/fail
> cycles require compile/deploy/restart, even if some tools ease the task.
> Despite their heavyweight development process, they are widely accepted
> in large companies, both because the J2EE stamp pleases managers and
> because of the vast number of entreprise-grade libraries available.
>
> On the other end, we have scripted frameworks like Ruby on Rails,
> Django[1], etc. The try/fail cycle is basically save/reload, and writing
> simple stuff is very fast because of the use of convention over
> configuration and runtime generation of data models (e.g. through
> database introspection). Now recent comments[2] show that going beyond
> the basic stuff may not be that easy.
>
> Cocoon actually sits in the middle of this spectrum:
> - it's a Java servlet and can therefore use almost anything that's
> written in Java. The contents of our blocks show this well! As such it
> is somehow J2EE compliant and can be deployed in large companies, even
> if we have to convince managers that it's better than Struts.
> - it's a scripted framework: sitemap, XSL, templates, flowscript. Save
> and reload! And this goes further in 2.2 with auto reloading and
> compiling classloaders.
>
> So theoretically, Cocoon could be the "RoR of J2EE". Now in its current
> incarnation it won't. The learning curve is too steep, some
> architectural choices imposed by Cocoon actually go in the way of
> developers with the new emerging development practices, and 5 years of
> legacy led to a rather confusing picture, with tons of legacy components
> and many inconsistencies.
>
> Now Cocoon has also introduced a number of super-cool features and
> innovations that definitely make sense, but in a more lightweight and
> consistent environment.
>
> So let's draw a picture of what could be Cocoon 3.0. I use a major
> version as the ideas outlined below are more than likely to require a
> code base, even if many code snippets can be reused from the current code.
>
>                        -- oOo --
>
> Giving its real role to the controller
> --------------------------------------
>
> When we introduced flowscript, we decided that <map:pipeline> should be
> the central switchboard through which *all* request go, and introduced
> <map:call function>. This leads most webapps written in Cocoon to have
> their sitemap starting with something like:
>
>   <map:match pattern="do_*">
>     <map:call function="do_{1}"/>
>   </map:match>
>
> Why in hell do we have to go through the sitemap to call a function and
> then go back to the sitemap through cocoon.sendPage()? This not only
> clutters up the sitemap with cut'n pasted snippets, but also makes the
> flowscript a second-zone citizen in the application.
>
> So I think we should change a bit the semantics of the sitemap and the
> priorities of its various components:
>
> - the sitemap is the configuration of the overall request processing of
> the application (or a subpart of it in the case of mounts). It defines
> the configuration of that request processing, which is composed of
> components (<map:components>), controllers (<map:flow>) and views
> (<map:pipeline>). And I even think these 3 parts should really be split
> in different files, i.e. moving components to a xconf and pipeline
> definitions to e.g. "pipelines.xml".
>
> - the processing flow in a sitemap goes *first* in the controller if
> there is one, and *second* in the view. Going to a <map:pipeline> to
> call back a function should really be an exceptional case, or even
> forbidden.
>
> - since it's no more called by the sitemap, the controller defines a
> single entry point, such as "process()". A builtin default
> implementation provides an equivalent to <map:call
> function="public_{request:sitemapURI}"/>, thus automatically publishing
> any public_xxx flowscript function. This is similar to
> HttpServlet.service() that calls doGet(), doPost(), etc depending on the
> HTTP method but still allows overloading service().
>
> - to allow sophisticated implementations of process() where needed, the
> matchers and selectors are made available to the controller, so that
> they can check the request environment as easily as the pipeline
> statements in sitemap.xmap (also have a look at the Django
> dispatcher[3]). This can allow to write something like:
>
>   var match = cocoon.matchers.wildcard("admin/*");
>   if (match) {
>       if (authenticateAdmin()) {
>           // call the function named by the '*'
>           adminServices[match[1]]();
>       } else {
>           forbidden();
>       }
>   }
>
> - calling cocoon.sendPage(uri) directly goes to the <map:pipeline>
> section of the sitemap, to build a view. This seems obvious, but has an
> interesting side-effect: there is no more need to invent a private URL
> space such as "view-*" to have a two-step processing (controller/view)
> of the requests. We can even say that "cocoon.sendPage(null)" calls the
> sitemap with the current request URI untouched.
>
> Note: the controller examples in this RT are written in JavaScript, but
> JavaFlow should be considered on an equal ground, as Coocon's user base
> is two-sided, composed of people coming from the webdesign/php world,
> and others coming from the J2EE world. Also, JavaFlow should be the
> language of choice for builtin reusable helper controllers.
>
>                        -- oOo --
>
> Expression languages
> --------------------
>
> Do you know how many expression languages there are in Cocoon? Java,
> JavaScript, XPath, XReporter, JEXL, etc. There's also all the
> micro-languages defined by each of the input modules: many of them use
> XPath, but not all...
>
> Also, the way to access a given data is not the same in the sitemap
> (e.g. "{request-param:foo}") and in JXTG
> ("${cocoon.request.getParameter('foo')}" or even
> "#{$cocoon/request/parameters/foo}")
>
> We should restrict the number of languages to the useful minimum, and
> ensure they can be used consistently everywhere. This useful minimum
> looks to me as being JavaScript, XPath and Java (using Janino[4]).
>
> As for the syntax, I think we should use the simple "{..}" notation,
> with no initial character. To choose among the 3 expression languages,
> we have to choose a default one, and use prefixed expressions for the
> other ones. I consider JS to me the most versatile and thus to be the
> default language.
>
> That means we'll have "{cocoon.request.remoteHost}" or
> "{xpath:$cocoon/request/remoteHost}" or
> "{java:cocoon.getRequest().getRemoteHost()}".
>
> About XPath, I'm a bit skeptical wrt its actual usefullness with non-XML
> objects, which often looks weird. However, we need to be able to call
> XPath on DOM parts of a non-DOM data model, e.g.
> "{xpath(cocoon.session.attributes.userDoc, '/meta/dc:title')}". And
> interstingly this sample shows that a namespace prefix table must be
> available in the expression context for *all* languages.
>
> All this also means that we need a well-defined "cocoon" object defined
> identically in all contexts. Additional top-level objects can be
> available to provide context-specific data, such as "flow.sendPage()",
> "sitemap.resolve('../1')" or "template.consumer".
>
>                        -- oOo --
>
> Content-aware pipelines
> -----------------------
>
> Cocoon 1 had a DOM-based processing, meaning transformations could be
> chosen according to the pipeline content. Cocoon 2, when switching to
> SAX-based streamed pipelines, abandoned this ability. This hasn't been a
> real problem for a long time, as datasources were mostly passive
> documents of a well-known structure.
>
> Now things have changed a lot, and we have to deal with heterogeneous
> data types and content-driven processing. Let's take some real-life
> examples:
> - Content syndication: a feed's URL can provide RSS 0.9, 1.0, 2.0 or
> Atom. How can we decide what processing has to be applied on a feed if
> we don't know what's inside?
> - Forrest's infamous SourceTypeAction[5] identifies a document's type
> using pull parsing
> - SOAP requests: why is SOAP so badly integrated with Cocoon? We
> basically need to delegate to Axis that will then call a Java class. Why
> so? Because we're unable to choose the service to be called depending on
> the request's content.
> - finally, the ESB buzz is turning into real projects, and requires
> content-based routing of messages.
>
> There were some proposals to implement content-aware selectors[6] but
> they never materialized because of the impedance mismatch between a SAX
> stream and the usage of DOM (so Cocoon-1-ish!) that was proposed to
> implement them.
>
> Now Forrest's SourceTypeAction shows us the way: pull parsing.
>
> So let's switch pipelines from SAX push to StAX pull (JSR 173, see[7]).
> Content-aware matchers and selectors can then grab just the amount of
> information they need from the pipeline to make their decision. And
> contrarily to the SourceTypeAction that requires to resolve the source 2
> times (once for pull, once for push), the pipeline engine can
> transparently buffer the StAX events consumed by matchers and selectors
> to replay them in the next pipeline component.
>
> Using pull pipelines doesn't mean we have to trash everything.
> Converting DOM to/from StAX is straightforward, and so is StAX->SAX. The
> SAX->StAX conversion is less easy and requires either buffering or a
> separate thread.
>
> Using pull pipelines also has an interesting side effects on
> aggregations, as they can easily be inlined by pulling events
> successively from partial pipelines (i.e. without a serializer), e.g:
>
>   <map:aggregate element="root">
>     <map:part>
>       <map:generate src="header.xml"/>
>       <map:transform src="header2html.xsl"/>
>     </map:part>
>     <map:part src="content/{1}.xml"/>
>   </map:aggregate>
>   <map:transform src="layout.xsl"/>
>   <map:serialize/>
>
> Actually, writing
>
>   <map:part src="foo"/>
>
> is equivalent to writing
>
>   <map:part>
>     <map:generate type="file" src="foo"/>
>   </map:part>
>
>                        -- oOo --
>
> Dynamic pipelines
> -----------------
>
> Yes, you read it well: dynamic pipelines. This is what comes next
> naturally after content-aware pipelines: with use cases like webservices
> and ESBs, the content-based routing is not enough and we also need
> controller-driven routing.
>
> For simpler cases, we already have cocoon.processPipelineTo() (and the
> more versatile PipelineUtils class), but having to call the sitemap and
> invent a private URL just to perform a transformation is really overkill.
>
> I'd like to be able to write the following in a flowscript:
>
>   var pipeline = flow.newPipeline("non-caching");
>   pipeline.setGenerator("stream");
>   pipeline.addTransformer("xslt", "normalize.xsl");
>   if (cocoon.matchers.xpath("/foo/bar[@id = '" +
>           cocoon.session.attributes.bar_id + "']", pipeline)) {
>       handleBar(pipeline);
>   } else {
>       wrongId();
>   }
>
> What we can see above is that we don't even need a serializer for the
> pipeline to be useful, as we can pull events from it as soon as it has a
> generator. And that generator could well be another pipeline built
> somewhere else.
>
> Basically, the pipeline engine becomes a very general-purpose object
> that can be used not only in the sitemap (to build views), but also in
> the controller for content-driven business logic decisions.
>
> This programmatic building of pipelines can also be used by Cocoon
> components themselves to implement some built-in transformations, e.g.
> converting an XMLSchema to a CForms definition, without requiring to
> copy/paste the corresponding sitemap instructions in user sitemaps, of
> requiring to call a system-defined sitemap. IMO, the lack of reusable
> system pipelines is one of the reasons why there hasn't been many
> off-the-shelf products or applications built on top of Cocoon.
>
> Being able to directly use pipelines can also ease the integration of
> Cocoon as a transformation engine in other environments, such as an
> advanced message transformer in the ServiceMix ESB[8].
>
>                        -- oOo --
>
> Controller-driven responses
> ---------------------------
>
> The advent of Ajax applications leads to a radical change in web
> applications architectures. There are many requests that don't lead to
> producing a view, but sending data and/or control information. Having to
> call a pipeline for this is really useless and overkill, as we don't
> need any kind of processing.
>
> We therefore need the controller to be able to directly send a
> non-processed response. We already have an example of this in the Ajax
> stuff for CForms[9] to send a simple <bu:continue> when form interaction
> is finished and a full page reload is needed. Another example is data
> transmission with an Ajax client using JSON[10].
>
> So we need additional "sendxxx" methods in the controller: sendText(),
> sendObject(), sendBytes() and why not sendStream().
>
> Ajax applications also require aggregations defined at the controller
> level. Let's consider an Ajax shopping cart application: the page
> displays the items catalogue and a sidebar with the current content of
> the shopping cart. When the user browses the items, only the catalogue
> area needs to be refreshed on the page. When he adds an item to the
> cart, both areas need to be refreshed at once to show the updated cart.
> The knowledge of what parts of the page need to be refreshed is in the
> controller. A solution can be to call a pipeline that will generate an
> XInclude that itself will call other pipelines, but that's smelly and
> doesn't allow to give different view data to each of the pipelines.
>
> To allow this, we need something like:
>   flow.sendMultiple(
>       ["catalogue", { paginator: paginator }],
>       ["cart-sidebar", { cart: cart }]
>   );
>
>                        -- oOo --
>
> Core components
> ---------------------
>
> Moving to pull pipelines isn't the only important core change: we need
> to move away from Avalon for good. Now what container will we use? We
> don't care: Cocoon 3.0 will be written as POJOs, and will come with a
> "default" container. Will it be Spring, Hivemind, Pico? I don't know. We
> may even provide configurations for several containers, as does
> XFire[xxxxx].
>
>                        -- oOo --
>
> Ok, thanks reading so far.
>
> My impression is that with all these changes, Cocoon will be sexy again.
> Add a bit of runtime analysis of databases and automatic generation of
> CForms to the picture, and you have something that has the same
> productivity as RoR, but in a J2EE environment. It also includes what I
> learned when working on Ajax and the consequences it has on the overall
> system architecture.
>
> You certainly have noticed that the above is more about the controller
> than about the sitemap. This is because not much changes are needed
> there, except content-aware matchers and selectors. But a more featured
> controller will allow to trash a great number of pipeline components
> that were invented to circumvent controller limitations. The code base
> will shrink.
>
> There are also a number of simplifications that can be done by using
> builtin conventions over configuration, but I'll write about this later.
>
> Tell me your thoughts. Am I completely off-track, or do you also want to
> build this great new thing?
>
> Sylvain
>
> [1] www.djangoproject.com/
> [2] http://www.andrewsavory.com/blog/archives/000976.html
> [3] http://www.djangoproject.com/documentation/url_dispatch/
> [4] http://www.janino.net/
> [5]
>
> http://svn.apache.org/repos/asf/forrest/trunk/main/java/org/apache/forrest/sourcetype/SourceTypeAction.java
> [6] http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101554683923592&w=2
> [7] http://stax.codehaus.org/
> [8] http://servicemix.codehaus.org/
> [9]
>
> http://svn.apache.org/repos/asf/cocoon/blocks/forms/trunk/java/org/apache/cocoon/forms/flow/javascript/Form.js
> [10] http://bluxte.net/blog/2005-11/17-49-57.html
> [11] http://xfire.codehaus.org/
>
> --
> Sylvain Wallez                        Anyware Technologies
> http://bluxte.net                     http://www.anyware-tech.com
> Apache Software Foundation Member     Research & Technology Director
>
>

Mime
View raw message