commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert burrell donkin <>
Subject Re: [OT] SAX & DOM was: digester + DOM
Date Wed, 26 Feb 2003 18:40:39 GMT
On Wednesday, February 26, 2003, at 04:08 PM, Erik Price wrote:
> James Strachan wrote:
>> Rather than using JTidy to parse HTML (which makes a DOM) you could use
>> NekoHTML which is-a SAX parser that can handle HTML. Then you don't need 
>> to
>> use a DOM.
> Sorry to hijack a thread like this, but I was curious -- if you're 
> building an in-memory representation of an XML document, is there still a 
> compelling reason to use a SAX parser?  Or should you just use DOM in 
> that case.

james can probably give you a pretty definitive answer to this question 
but here's my two penneth.

i think that the answer about this depends on what in-memory 
representation you want. DOM is a generic representation. different kinds 
of xml (eg having different schemas) are represented using the same 
objects. this may be good or bad depending on the circumstances. if you're 
interested in general xml then a general representation is best. but there'
s more than DOM out there. there are several general representations (eg. 
dom4j) which offer more java-friendly APIs.

even when you're dealing with general representations, SAX (and therefore 
digester) can have advantages over DOM. with SAX it is easy to filter so 
that only the part of the object model you're interested in is created. 
digester has a rule that creates partial DOM object models which can be 
used in this way.

on the other hand, a very common use case is having a particular object 
model in mind which is represented by strongly typed java beans. in this 
case, though the mapping is to an in-memory object model, there is a 
considerable performance benefit (both speed and memory) in using SAX 
rather than DOM. there are a number of technologies (eg. castor, JAXB, 
betwixt) which do this - and digester is also commonly used for this 

- robert

View raw message