xerces-c-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jon Smirl" <jonsm...@mediaone.net>
Subject Re: DOM Performance
Date Thu, 01 Feb 2001 04:21:06 GMT
The C++ version of Xalan has recently (ie the version in CVS) changed from
using the Xerces DOM to a newly written implementation. The new DOM
implementation is completely C++ oriented and it uses standard strings.
Right now it is meant for internal use by Xalan; not as an externally
accessible DOM but it might be a good starting point . Xalan DOM trees work
in a multithreaded environment, but the each DOM tree only allows singled
threaded access.

Removing the string and synchronization overhead of the Xerces DOM resulted
in performance gains of about 30% in Xalan transforms. I expect this gain to
be much higher on SMP machines but it hasn't been measured yet.

An alternative DOM implementation that I would find very interesting is a
DOM that does lazy charset transcoding. In other words the source document
is read completely into memory (via memory mapped files if you have it). The
parser then parses the document and builds a DOM using pointers to the
source buffer plus a length. Then, only if the text node is accessed from
the DOM is the string copied and converted to Unicode.

This DOM implementation could have a big performance advantage in a system
like Xalan. 99% of the time DOM strings are copied straight from the source
document to the output stream without the need for transcoding since most of
the time the source and output documents are in the same charset. Note that
this lazy transcoding system does not stop transcoding from happening, it
just avoids it if possible. One nice side effects is a greatly reduced
memory footprint allowed by memory mapping the input documents.

Jon Smirl

View raw message