xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott_B...@lotus.com
Subject Re: JDOM in Apache (was Re: xml.apache.org charter proposal)
Date Thu, 05 Apr 2001 16:20:41 GMT

> Are
> there any thoughts towards being able to edit the tree?

Yes.  There is "construction", i.e. append at the end of the DOM (in the
XSLT processor, we have need to build trees on the fly, called Result Tree
Fragments).  If the parser constructs it though, it probably doesn't need
to support these methods.  Then there has been talk of a DTMWriteable
interface that derives from DTM.  But it should be optimized as a read-only
tree, and the writeable stuff should be an afterthought.  Insert operations
are going to be a real pain in an array based model, and will probably
invalidate any handle that occurs after the insert, unless some ugly magic
is done.

> What I really
> want to know is how much performance do you expect to gain
> by allowing the string values to be "chunked"?

Hmm... well, I expect to gain a lot of performance by having the DTM manage
the character buffers.  In our experience so far, in order for it not to
have to do realloc-larger-buffer-then-copy-from-orginal-buffer operations,
data from any given text node needs to be chunked, even though the
XPath/XQuery data model looks on those chunks as a single text node.

> Perhaps I just need some clarification about the chunking.
> For example, does the code constructing the tree need to
> orphan its character buffers to the DTM in order for the
> chunking to work?

I don't think I understand this question.  The management of the DTM
character buffers is totally up to the DTM.

> Or is this chunking just for the internal
> DTM use to store text?


I should note that so far in the conversion of Xalan to this interface,
I've not yet used the chunking directly... I always just call
dispatchCharactersEvents to get the characters to a ContentHandler.  But
down the line, as we further reduce string usage and develop an all pull
pipeline (the ability to do a DTM pull as the XSLT result), we may be using
these methods more.

> I want to make sure that the things we add for performance
> reasons will actually improve performance and not just move
> work from the data model into the parser.

What I'm trying to achieve is the reduction of character copying as much as
possible.  In performance profiles of XalanJ2, this is a major issue.  The
work we've done in Stree confirms this.

> Also, what about the setParseBlockSize method? What's that
> doing there?

This is a bit more of a half-baked thought.  The idea is to let the calling
application muck around with the buffer sizes that the parser uses... i.e.
does it use 1K, 4K, 8K, or 32K blocks?  I thought that with a pull-model
parser these blocks would define the granularity of tree construction.  It
seems to be that in some cases a smaller block would be better, while in
others a larger block might be better.  You would know more about this than
I.  It was just a thought... I'm not religious about keeping this.

> I don't like using runtime exceptions as the exception
> model. I would prefer to use an explicit exception class.
> In XNI, we've agreed to use the SAX exceptions throughout
> the framework.

Ah.  The problem with checked exceptions is they ripple through-out the
entire framework, and I'm not sure to what real benefit, in my experience.
I wouldn't be too upset about using checked exceptions, and even SAX
exceptions.  Other people I know would much prefer runtime exceptions.
But, to tell you the truth, I don't know what the answer here is.  Runtime
exceptions are awfully adhoc.  I would be very interested in getting lots
of folks opinions about this.


In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org

View raw message