xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Bradford <bradf...@dbxmlgroup.com>
Subject Re: [spinnaker] Announce
Date Wed, 12 Jul 2000 22:08:47 GMT
Scott Boag/CAM/Lotus wrote:
> The best optimizations often come from algorithmic tuning, rather than code
> tricks, and these kind of optimizations, especially at the stage that
> Xerces and Xalan are at, are helped, rather than hindered by clean,
> understandable code.  Knowing you, I'm pretty certain that we are in
> violent agreement on this.  On the other hand, there may be system level
> complexity introduced by these algorithmic tunings, like the use of string
> pools in Xerces.  Do you suggest that Xerces shouldn't use string pooling?
> >From what I've heard and experienced, string pooling makes a major
> difference in the performance.  I would keep the string pooling for
> Xerces2, in spite of the increase in complexity, unless someone can prove
> it doesn't make a major difference.

To really optimize XML interactions, the best way to gain overall
performance in long-running systems is to avoid parsing more than is
necessary.  As part of the dbXML project, we've developed a compression
format that encapsulates a parsed document into a stream-based
traversable tokenized tree.  As far as our design was concerned, we
absolutely had to avoid parsing at all costs, and we had to avoid having
to produce an entire DOM tree graphed into memory if we were only
dealing with a subset of the nodes.  All interactions that would be DOM
based like XPath, indexing, and modification can be performed against
the compressed image. We also use collection-based Symbol tables for
centralizing the string representation for the element/entity/attribute

The only overhead comes when you're converting the DOM back into a
text-form for output, though you'd have to do that with any DOM
implementation.  Fortunately, this conversion is typically only the
output of an operation, and such a small piece of the overall puzzle
that it doesn't kill us to do it.  We're also looking at a way to make
the Symbol Table/Stream Format portable so that a client/consumer can do
the textual conversion of the tokenized stream.


View raw message