cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject Re: [ANN] VTD-XML Version 1.5 Released
Date Mon, 20 Feb 2006 19:21:31 GMT
Jimmy Zhang wrote:
> Hi, Thanks for the email.
> My answers to your questions:
> 1. It is a tradeoff-VTD-XMl consumes more memory, but
> is easy to use and more powerful, Any random access capable XML 
> processing API *needs* to at least load the entire hierachical structure 
> in memory. My take is that among SAX, STAX, DOM
> and JDOM, vtd-xml is the least likely one to choke, and best one
> to handle peak loads...


most XSLT cases *NO NOT* need to load the xml in memory to be able to 
process it. Unless you abuse xsl:sort or xpaths with .., most things can 
be done with pure event-driven pipeline style, and only a small buffer 
needs to be kept in memory.

Xalan XSLTC is able to pre-process xslt stylesheets and compile them 
into code that will know how much buffer to keep because it knows what 
kind of xpath events will be called on the incoming stream.

> 2. Agree with you, benchmarking a dummy SAX parser is unfair for VTD-XML,
> that will make VTD-XML look prettier in real life scenario.

whatever #2, playing smartass (and avoiding the issue that I mentioned) 
is unlikely to make your points more solid.

> 3. Look at all the vertical industry XML related vocubalry,  SOAP,
> Rest and XML schema, and infoset data model, DTD seems deprecated
> a bit, and VTD-XMl doesn't support external entities... other than that
> VTD-XML is equally capable

I agree that DTDs should be deprecated and seem like an SGML vestigial 

My point is that it's unfair to compete with a fully compliant xml 
parser with a parser that knows how to cut corners (and therefore 
doesn't have to scan the text for entities to expand!).

if xerces was allowed to get away with no need to parse entities and 
didn't have to create strings, it would be just as fast as yours.

BTW, you have not answered these questions:

>> You claim xpath random access, but what is the algorithmical 
>> complexity of that? O(1), O(log(n)), O(n), O(n*log(n))? If one were to 
>> store the parsed tree index on disk, how many pages would one need to 
>> page in before reaching the required xpath?


View raw message