xerces-c-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Roddey <drod...@charmedquark.com>
Subject Re: DOM Performance
Date Thu, 01 Feb 2001 05:10:27 GMT
> An alternative DOM implementation that I would find very interesting is a
> DOM that does lazy charset transcoding. In other words the source document
> is read completely into memory (via memory mapped files if you have it).
> parser then parses the document and builds a DOM using pointers to the
> source buffer plus a length. Then, only if the text node is accessed from
> the DOM is the string copied and converted to Unicode.
> This DOM implementation could have a big performance advantage in a system
> like Xalan. 99% of the time DOM strings are copied straight from the
> document to the output stream without the need for transcoding since most
> the time the source and output documents are in the same charset. Note
> this lazy transcoding system does not stop transcoding from happening, it
> just avoids it if possible. One nice side effects is a greatly reduced
> memory footprint allowed by memory mapping the input documents.

The Java parser/DOM does this, and I personally think its more complexity
than its worth, and its worse when you really do end up touching most to all
of the document. It can create very good looking benchmarks, but I'm not
convinced its a real world win overall. And of course it would require
rewriting the entire parser system.

Dean Roddey
The CIDLib C++ Frameworks
Charmed Quark Software

"We're gonna need a bigger boat"

View raw message