cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Berin Loritsch <blorit...@apache.org>
Subject [FYI] How TreeProcessor Works
Date Fri, 24 Oct 2003 13:52:54 GMT
TreeProcessor is a complicated beast, so examining the classes does not
lend any clues to what is going on.  However, the key to understanding
TreeProcessor is the treeprocessor-builtins.xml file.

We have an XML document with the following DTD:

<!DOCTYPE tree-processor [
   <!ELEMENT tree-processor (language+)>
   <!ELEMENT language (namespace, file, parameter, roles, nodes)>
   <!ATTLIST language
     name CDATA #REQUIRED
     class CDATA #REQUIRED
     pool-min CDATA #IMPLIED
     pool-max CDATA #IMPLIED
   >
   <!ELEMENT namespace EMPTY>
   <!ATTLIST namespace uri CDATA #REQUIRED>
   <!ELEMENT file EMPTY>
   <!ATTLIST file name CDATA #REQUIRED>
   <!ELEMENT parameter EMPTY>
   <!ATTLIST parameter element CDATA #REQUIRED>
   <!ELEMENT roles (role+)>
   <!ELEMENT role (hint*)>
   <!ATTLIST role
     name CDATA #REQUIRED
     shorthand CDATA #REQUIRED
     default-class CDATA #REQUIRED
   >
   <!ELEMENT hint EMPTY>
   <!ATTLIST hint
     shorthand CDATA #REQUIRED
     class CDATA #REQUIRED
   >
   <!ELEMENT nodes (node+)>
   <!ELEMENT node (allowed-children*, ignored-children*, forbidden-children*)>
   <!ATTLIST node
     name CDATA #REQUIRED
     builder CDATA #REQUIRED
   >
   <!ELEMENT allowed-children (#PCDATA)>
   <!ELEMENT ignored-children (#PCDATA)>
   <!ELEMENT forbidden-children (#PCDATA)>
]>

So with a mock XML slimmed down to just the simplest state:

<tree-processor>
   <language name="sitemap"
       class="org.apache.cocoon.components.treeprocessor.sitemap.SitemapLanguage"
       pool-min="1" pool-max="1">

     <namespace uri="http://apache.org/cocoon/sitemap/1.0"/>
     <file name="sitemap.xmap"/>
     <parameter element="parameter"/>

     <!-- roles skipped because they are irrelevant -->

     <nodes>
       <node name="pipelines"
builder="org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNodeBuilder">
         <allowed-children>pipeline, handle-errors</allowed-children>
         <ignored-children>component-configurations</ignored-children>
         <forbidden-children>sitemap, components, pipelines</forbidden-children>
       </node>
     </nodes>
   </language>
</tree-processor>

What is happening here is that we define a sitemap tree parser by first
identifying how to recognize the sitemap: the namespace for the XML,
the default file name, how to recognize the "parameter" element (special
to TreeProcessor semantics).  I skipped the roles definition because in
Cocoon 2.2 it won't be needed.  However, it describes the default types
of components that the tree processor expects.

The Nodes section is the heart of the system.  It maps XML elements to
Builder objects which perform some sort of logic.  The child elements
"allowed-children", "ignored-children", and "forbidden-children" act as
a "poor man's" DTD so to speak.  At least they provide some explicit
processing hints that augment a DTD.  In the example above, the
"pipeline" and "handle-errors" are child nodes that are explicitly
allowed and handled from inside the "pipelines" node.  The
"component-configurations" node is allowed to exist as a child of
the "pipelines" node, but no processing occurs.  Lastly, the
"forbidden-children" element identifies nodes that cannot exist as
a child of the "pipelines" node.

All the enumerated elements (enumerated by a comma and any amount of
whitespace) must be declared nodes so that they can be processed.

In theory, XSP pages *could* be implemented with the TreeBuilder, but
in practice, you cannot predict the schemas used for elements other
than the XSP specific ones.  The TreeProcessor is best suited for fully
encapsulated schemas that act as a sort of language like the Sitemap.

This at least is the base theory behind the TreeProcessor--as far as I can
tell.  Please let me know if I am missing it somewhere.

As to implementation, the TreeBuilder creates a heirarchy of ECM
implementations that add any necessary components and Builder components.
The particularly troublesome portion of this is the use of the Recomposeable
interface.

The whole issue with the Recomposable interface as it is written here is that
the child and parent component managers are constantly overwriting each other.
THis is a serious conflict, and it will break as soon as we proxy components.
The proxied components hide any lifecycle interfaces so that no rogue client
can usurp the component manager, or any other part of the lifecycle of a
component, and provide for a more stable system.

THe recomposable calls scare me because they look like something that would
work under low load, but would break down under high load.  With something
like Cocoon that is a big issue.  I don't have any numbers to show everyone,
but it is just a feeling I get by looking at the code.

As to the nitty gritty details of how the node tree is built and run, I am
still somewhat fuzzy on the details.  I know we have a bunch of NodeBuilders,
which instantiate the Nodes, which in turn are special components.  The
NodeBuilders can be viewed as a sort of intelligent object creator, but I
am not sure whether the Nodes are components with relaxed requirements on
the constructor, or if the Nodes are simple objects.  Those Nodes are what
does the hard work.  Once the tree is built, the builders are not necessary
any more (unless you want to keep building new trees).

I know I want to have a new Container per sitemap, but I think I need some
help in mapping it to this problem space.  Ovideu, do you think you could at
least spare some guidance?

-- 

"They that give up essential liberty to obtain a little temporary safety
  deserve neither liberty nor safety."
                 - Benjamin Franklin



Mime
View raw message