commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reid Pinchback <>
Subject Re: [digester] initial code for Digester2.0
Date Thu, 03 Feb 2005 15:52:52 GMT

--- Simon Kitching <> wrote:

> On Wed, 2005-02-02 at 20:45 -0800, Reid Pinchback wrote:
> Of course if someone can demonstrate that non-namespace-aware parsers
> *are* still useful then I'll change my mind.

Just to clarify, since I was being sloppy before (I gotta
stop typing in shorthand) there is an important distinction:

a) having NS-aware parser, always using NS-aware API methods
b) having NS-aware parser, selectively using NS-aware API methods
c) having non-NS-aware parser (and obviously never using NS-aware API methods)
d) having NS-aware parser where the developer fixes a grammar that
   ignores any NS distinctions

Even for Sax the performance difference between (a) and (b) is roughly 
a factor of 2 across all parsers when processing small (typical message-sized) 
docs that don't use NS.  Mucking with (d) is supposed to result in significant
wins when you tune the grammar handling to your app, but I haven't tried it 
myself and I've never seen timing differences quoted.  

I'm not trying to advocate any approach except to notice that, since your 
README mentioned requiring a namespace-aware parser, it sounded like 
there was a potential for options (b), (c), and (d) to become unintentionally
closed to developers in Digester2 when they weren't in Digester1.  I agree
that old parsers providing (c) aren't particularly interesting, but
if you spend any time tracing through the guts of the parsing, particularly
when you see how DTDs are loaded for entity resolution, you begin to see 
(d) as having potential.  Throwing (b) away may result in less code in
Digester2, but it may be worth doing some timing tests to see if that 
code reduction is consequence-free.

> I still find it hard to believe that leaving out namespace support makes
> a performance difference. The parser needs to keep a map of
>    prefix->(stack of namespace)
> and that's about it. 

Actually the XML spec distinguishes between the default namespace
and all other namespaces, so parsers can reasonably make the same
distinction and try to avoid a bunch of per-entity operations and 
temporary object creations in the case where there is no namespace.
Look at the piccolo stats published on Sourceforge.  Compare Soap, 
Soap+NS, and random XML-no NS timings and it suggests that NS 
ain't free.

Useful links:

  Jade (now part of Javolution),
  look at the javolution.xml package (trades String for CharSequence
  to increase performance, but keeps NS)

  Picollo you probably already have the link for, but for anybody
  else interested:

  Zapthink comments on XML parsing challenges,,289142,sid26_gci858888,00.html

  Developerworks articles on XML performance,

  Sun articles on XML performance,

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message