commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <skitch...@apache.org>
Subject Re: [digester] initial code for Digester2.0
Date Thu, 03 Feb 2005 06:34:11 GMT
On Wed, 2005-02-02 at 20:45 -0800, Reid Pinchback wrote:
> --- Simon Kitching <skitching@apache.org> wrote:
> 
> > Supporting namespaces in an xml parser seems very simple to me. I think
> > it much more likely that only antique and unmaintained parsers fail to
> > support namespaces. And people who are determined to use antique and
> > unmaintained parsers can just stick with digester 1.x as far as I am
> > concerned. I'm not pushing for digester to remove non-namespace-aware
> > support - just digester2!
> 
> Wow, that is an unexpectedly harsh reaction.  My reason for asking 
> was simple, and I believe not unreasonable.   You were the one asking
> for feedback on your proposal. 

Sorry, Reid. I didn't mean it that way. I apologise for any offense.
I was just stating my personal opinion - that every app eventually drops
support for obsolete technologies, and I think it's time to drop support
for non-namespace-aware parsers. 

I was serious, too, about users of old technology sticking with digester
1.x. I'm aware that upgrading core libs can sometimes be a pain, but
digester1.x is still there and isn't going away (I'm one of the
maintainers for that code, and have every intention of continuing that
even when 2.0 is out). I just don't see the point of migrating the
"backwards compatibility" code from the 1.x series. 

Of course if someone can demonstrate that non-namespace-aware parsers
*are* still useful then I'll change my mind.


> Using the namespace-based API of an XML parser is known throughput substantially, 
> covered in a host of Java xml mag articles, available from google searches, and
> one or two of the Java performance tuning books still in distribution.  XML 
> performance tuning is a tough area, and people continually struggle with it.
> I don't recall the SAX-only stats, but I know that for DOM parsers you can 
> shoot for an increase XML processing bandwidth by an order of magnitude through 
> a change in parser and not using NS.  Antiqueness of parsers isn't the issue.

Is there any chance you could provide a reference to such an article?

I still find it hard to believe that leaving out namespace support makes
a performance difference. The parser needs to keep a map of
   prefix->(stack of namespace)
and that's about it. Surely that's a fraction of a percent of the parser
performance, memory usage, and processing time. So why wouldn't a parser
do it?

Leaving out *validation* would improve performance and footprint
significantly, but validation and namespace support are unrelated.

I had a quick look for high-performance/small-footprint xml parsers:
 parser      NS-support     maintained?
 Piccolo       y              y
 Aelfred       y              y
 ElectrixXML   y              n? (can't find a current website)
 MinML         n              n (last release nov 2001)
 NanoXML       y              n (last release april 2002)
 JapiSoft      y              y (commercial)

I also googled for "xml parser performance namespace" but didn't get
anything relevant.

> I think it helps to keep in mind that NS was intended as a way of creating 
> name-resolution scopes that allow the merging of document structures from 
> different origins that otherwise could experience element and attribute
> name clashes.  When somebody has an application that doesn't require that 
> kind of merging, and they aren't using a namespace-dependent XML technology 
> like Soap or XMLSchma, then using using NS features of an NS parser can
> be a burden without corresponding benefit.  Under the hood, that parser has 
> to do a lot of work to continually manage the NS resolution of the node names.
> It has no way of knowing that the work is pointless - you've told it to
> assume that there is a point when you use the NS features.

True. Namespaces are not relevant in many contexts. But as noted above,
I do find it hard to believe that "parser has to do a lot of work to
manage NS resolution". If you can show me I'm wrong, I'll buy you a
beer ;-)

Regards,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message