cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Boag/CAM/Lotus" <>
Subject Re: [Moving on] SAX vs. DOM part II
Date Tue, 25 Jan 2000 03:34:28 GMT

Reasons to use SAX vs. DOM:

1) Performance.  Object creation in Java is very expensive.  Note that XT
duplicates the source tree to an internal structure, which costs still
more.  Though Xalan can use a generic DOM, it is happier and faster when it
can create it's internal psuedo-DOM (DTM).  And both XT and Xalan build
their own internal stylesheet structure, so when you hand in a stylesheet
DOM, a similar structure is duplicated internally (LotusXSL used to use a
raw DOM for the stylesheet, but the performance penalties of this are

2a) Memory.  Every DOM node is very expensive.  In Xerces, *every* node,
including attribute nodes and zillions of whitespace nodes, have an
ownerDocument, parentNode, previousSibling, nextSibling, name, value,
readOnly, userData, firstChild, lastChild, syncChildren, syncData, and
changes member variables, all 32 bits each, and most of the nodes have
additional fields defined by the subclasses.  While it is possible to
create a more memory conservative DOM, as we tried to do with our read-only
and limited-feature Document Table Model (DTM), everything you do has
performance implications.

2b) Memory.  DOM text() nodes, comment nodes, etc., store and return their
data in the form of Java Strings.  Strings are evil performance roadblocks
in Java.  SAX, on the other hand, can pass the data via char arrays.  If
you have a very smart parser, in some cases it is possible to pass a direct
reference of the source buffer (if no entities occur), dramatically
improving performance and memory consumption.

3) Latency.  Any dynamic rendering or transformation on the server needs to
get bytes out to the pipe as soon as possible.  For the source tree, the
transformation needs to occur while the parse is still going on, if
possible (for some types of transformation, it is not possible).  While
this is somewhat possible with a special-purpose DOM, it is not possible
with a generic DOM... which negates the reason to use a generic DOM API in
the first place.  For the result tree, the bytes need to feed to the pipe
as soon as they occur.

4) Piped transforms.  Related to #3, I believe Cocoon has the requirement
to feed the results of one transform into the source tree of another
transform.  The only way to do this and have even a hint of good
performance is with SAX (sorry, I don't believe you can do this with a
"smart" DOM, since a DOM node has no endElement event).

I think there's more, but this should be justification enough.  Over time,
content/presentation separation transformation architectures based on XML
will fail unless they perform.  From what I know of Cocoon (which is not
enough), it does not constrain itself to small documents, and, indeed, any
document architecture should be able to handle fair sized documents.  If
you have incremental transforms from the server with reasonable load, along
with incremental transforms on the client, the user should not notice the
difference between Cocoon and a server with static HTML pages.


|                        |   Stefano Mazzocchi    |                        |
|                        |   <> |           To:          |
|                        |                        |   Cocoon               |
|                        |   01/23/00 05:57 PM    |   <cocoon-dev@xml.apach|
|                        |                        |>               |
|                        |                        |           cc:          |
|                        |                        |   John Milan           |
|                        |                        |   <jmilan@DataChannel.c|
|                        |                        |   om>, "Clark C. Evans"|
|                        |                        |   <clark.evans@manhatta|
|                        |                        |>, Ted   |
|                        |                        |   Leung                |
|                        |                        |   <>,|
|                        |                        |   Scott Boag           |
|                        |                        |   <Scott_Boag/CAM/Lotus|
|                        |                        |>          |
|                        |                        |           Subject:     |
|                        |                        |   [Moving on] SAX vs.  |
|                        |                        |   DOM part II          |

Hello people,

[note: I CC'ed all the people that should be involved in this discussion
which I find critical for the evolution of the Cocoon project. Sorry for
those of you that are also subscribed to the Cocoon-dev mail list, but I
want you to be named so that others know who you are and the role that
you have]

I'd like to introduce you the people on this microforum:

- John Milan, is a software architect corrently working for DataChannel.
He'll play the DOM expert and DOM advocate role since he helped creating
and designed the DataChannel virtual DOM implementation. John contacted
me before Xmas investigating possible integrations of their DOM
implementation into Cocoon as an official donation. I waited until 1.6
was released. Now it's time to talk about it a little more.

- Clark C. Evans, is a crazy and brilliant guy that works for the
Gardner Group (or had worked for.. sorry I don't know your status right
now :)... he felt in love with Cocoon when I showed him his power last
may during first Exolab in France. Since then, Clark has been very
active in both his list and xml-dev and the xsl list, also proposing an
alternative to XML called YML. Clark was the very first to outline the
problems of the DOM model for big files, also advocating for XSLT
incremental operativity. Here, he'll play the SAX-DOM hybrid advocate
and something else, I'm sure :)

- Ted Leung, is one of the key software engineers at the XML team in
IBM, he's one of the makers of the Xerces parser and he'll play the role
of the man that worked with both SAX and DOM. I hope he'll bring
knowledge about integrating the two and how Cocoon integration with
Xerces can be smoother, faster and more useful for everybody.

- Scott Boat, is a software engineer at Lotus, author of the Xalan XSLT
processor, member of the XSLT WG at W3C. In this discussion, he'll play
the XSLT expert role as well as parser-liaison expert. Scott and I had
frequent and productive discussions about better APIs for LotusXSL and
now Xalan, but, as he recently expressed to me privately, we need more
integration between the projects. This discussion wants
to clear out the problems and start a continuous dialog that makes
Cocoon benefit more and more for the close collaboration with such
powerful and well implemented software.

but let me assing other roles for the people already on this list:

- Ricardo Rocha, he'll play the dynamic XML guru role.

- Pierpaolo Fumagalli, he'll play the static XML guru role.

- myself, I'll play the "all right, all right, but let's come up with a
working solution" role. :)

Ok, but what is this discussion about?

You all read the Cocoon2 proposal where I outlined the problems in
current Cocoon architecture. Some of you don't like that proposal, some
of you liked it before but changed your mind, myself, I changed my mind
so many times I don't know what to do.

While the DOM model is not posing that many limitations on dynamic
operation (Cocoon is not generally used to generate mb-long web pages),
it is on static operation (I'm talking about Stylebook at this point,
but you should consider Cocoon2 = Cocoon1 + Stylebook) where mb-long PDF
reports are not that far away to be considered.

On the other hand, key issues about web operation (like content-length,
expiration headers and such) or internal operation pose a great deal of
problems when the DOM model is abandoned.

This discussion should be focused on answering this question:

"what is the best architecture for Cocoon2?"

in answering this question, we should consider both dynamic and static
operativity, performance, memory usage, scalability, availability of
implementations, degree of standardization, degree of usability, ease of
use, cost of operation and time to market of a possible solution.

Also, I would like you to focus on practical considerations rather than
theorical approaches, so, to prevent "pindaric flights", I fix some

1) the adoption of W3C standards is not under discussion. We should work
with what it's standardized "today". Proposals that rely on
yet-to-be-finalized features or new ideas will be evaluated one by one,
but as a general rule, we should play with the rules we already have.

2) nothing in the Cocoon architecture is carved in stone. Even less, the
cocoon2 proposal. We are open to all kinds of suggestions and I'm
willing to undertake a major code rewriting if the benefits are evident
and long lasting.

3) this discussion is about internal architecture, and should not deal
with other issues such as XSP vs JSP, or XSP vs. XSLT-extentions, or
producer vs. processor, or Xalan vs. XT or anything like that. Let's
remain focused on the underlying architecture, everything else will be
dealt with when this is resolved.

4) this discussion will be orthogonal to the sitemap design, meaning
that the sitemap will not make assumptions on the underlying API
architecture used inside Cocoon.

Ok, I'll start with my personal and very brief comment:

"I like DOM because I'm lazy and I don't want to rewrite Cocoon, but I
also know that Pier needs SAX support for static operation and we need
better links between Cocoon and X*L components than DOM 1 provides. I'd
be glad to make everyone happy without rewriting the whole thing,
removing this debate once and forever"

Now, tell me what you think and I warn you: if you don't speak up now,
I'll continue with what we have today since I'm happy about it, I don't
have the itch to scratch and I'm lazy :)

[NOTE: this discussion is open to _everyone_ of course, even those one
not listed above. Also, you are not forced to play the roles that I
assigned you up above. I just wanted to kick you in.]

Your turn, now. Let the discussion begin.

Stefano Mazzocchi      One must still have chaos in oneself to be
                         able to give birth to a dancing star.
<>                             Friedrich Nietzsche
Come to the first official Apache Software Foundation Conference!
------------------------- http://ApacheCon.Com ---------------------

View raw message