cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Non-object tree interfaces
Date Fri, 30 Mar 2001 20:08:55 GMT
Though you guys might be interested in this, if you don't subscribe to the
xerces-j-dev list, especially in the concept of makeing the entire pipeline
a pull model instead of an event model (which pretty much requires threads,
unless someone has some bright ideas).  I'm curious about your thoughts.


----- Forwarded by Scott Boag/CAM/Lotus on 03/30/2001 03:05 PM -----
                    Scott Boag                                                           
                    03/30/2001           cc:                                             
                    03:04 PM             Subject:     Non-object tree interfaces         

As some of you know, over in Xalan land we are looking to revamp our
internal processing model to no longer use DOM interfaces.  The main
problem with the DOM is that a node has to be represented as an object with
identity, which requires a certain amount of resources.  I believe we've
come about to the limit with direct DOM processing.

An alternative to the DOM interfaces is an index-based API, i.e. something
like:  dtm.sourcetree.getData(nodeID), sdtm.ourceTree.getNameID(nodeID),
dtm.getNextSiblingID(nodeID),  dtm.dispatchCharacterEvent(nodeID,
contentHandler), etc.  We're going to call the API "DTM", along the lines
of the original Document Table Model tree in XalanJ1.  Xalan would walk
this API directly.  DOM2DTM and DTM2DOM classes will be used for external
interfacing with foreign DOMs.

Along with this, we're creating a DTMManager class that will manage
multiple trees in a process.  At the very least this is needed so a node
handle can maintain unique identity among nodes from other document trees.

Right now we are going to do a SAX2DTM implementation that will build from
pure SAX events.

We would like XercesJ2 to eventually have a closely coupled implementation
of these interfaces, or something like them that we can agree on.  For one
thing, underneath the DTM we would like to see pull parsing implemented
(right now Xalan currently has to use two threads to achieve transformation
that can occur while the parsing is going on).  For another thing, we
believe there are many things that can be done at the parser level to give
a high degree of optimization.  Reducing the number of character copies
that have to occur between a client of the DTM and the parser would be a
very big deal.  Also, we could do things like passing simple XPaths that
match the schema key patterns, and having Xerces do some of the work in
streaming mode, possibly enabling it to skip subtrees and the like.

Ultimately, we need to revamp Xalan so that the output is also a DTM, so
the transform itself can be pulled, so the whole pipeline can be "pulled".

I include the associated links to the repository pages for these
interfaces.  The interfaces are still sketchy right now and functionally
incomplete (for instance, we have to do work on what the exception model
should be...).

It would be really great if the Xerces community could work with us on
these interfaces, and an eventual implementation.  The interfaces
themselves can be eventually moved to the AXDK project that we have been
talking about on the general list.

Thoughts?  Would you guys be open to collaborating on this?

Has some dependencies on:


To unsubscribe, e-mail:
For additional commands, email:

View raw message