commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <>
Subject [xmlio] comparison with Digester
Date Sun, 10 Oct 2004 10:03:56 GMT

I've had a look at the new "xmlio" code in the sandbox; below is my
initial opinion. Note that I am a committer on Digester, and am
therefore totally biased....

The "in" code that is used to parse input xml documents is really really
similar in concept to what Digester already does.

The main features appear to be:
A "callback handler list" (instead of a single ContentHandler object).

Well, this seems to me to be equivalent to the Digester concept of
having multiple Rule objects match a particular input element, ie
several distinct sections of code can all be triggered when a particular
element is matched.

A complete path to the current element is passed to the "startElement"

Digester has the "getMatch" method which can be called by any rule to
get the path to the current element. Xmlio does provide a SimplePath
instance instead of a plain string to represent this path (equivalent to
the File class wrapping a filename). However in Digester you don't
really need anything more complex than a string because you don't
normally do computations on paths anyway - you leave that up to the
"rule matcher" class.

The xmlio concept of having a callback method invoked at element end
which passes both the element text and the element attributes is mildly
useful (but calling this method "startElement" is rather confusing IMO).
It would certainly be possible to add this feature to Digester/Digester2
(though it does have a minor performance drawback). With the current
digester code, you can clone the attrs and push them on a (named) stack
in begin() and then fetch them back in body() to get the same effect.

My initial feeling, therefore, is that I would much rather see
additional work put into the Digester project than having this new xmlio
project essentially recreate a subset of Digester functionality.

The main problem with Digester, I think, is that it has nasty
inter-class dependencies that prevent subsets of the classes from being
distributed. Every Rule class depends upon the Digester class, which
provides "parse context". But the Digester class has factory methods for
all the rules - so Digester can't be distributed without including *all*
the Rule classes. Breaking this dependency is number one priority for
Digester2 as far as I am concerned. It isn't that hard to do; I've been
experimenting with various refactorings already.

I would love to see several jar files built from the Digester2 source: a
"src" distro, a "full" jar, and a "basic" jar. The basic jar would have
about 8 classes, being about 4 classes of core functionality and 4 basic
Rule classes. In this form I think the "basic" jar could be entirely
appropriate for use by projects such as an i18n library without
resorting to creating a new project. Xmlio by itself doesn't provide any
functionality to actually instantiate objects or set properties; you
need to write one or more subclasses of SimpleImportHandler (similar to
ContentHandler), so by the time that is done I think that code based on
Digester and xmlio would be pretty similar in size.

There is one other significant issue: required libraries. Digester
depends upon BeanUtils, mainly because it performs dynamic conversion
between strings and other datatypes such as int, bool, etc. For a
light-weight parsing library this could be a nuisance; I expect we could
find a way of making automatic datatype conversion (and therefore
BeanUtils functionality) optional though. Digester also depends upon
commons-logging. I did make a proposal a while ago to make logging
dependencies in Digester optional; the patch wasn't received with any
great enthusiasm at the time, but with people actually pushing for this
it might make it into Digester2.

Regarding the "out" part of the xmlio libs: this is basically a
collection of static functions doing simple but useful xml string
encoding etc., and a stream class that does auto-indenting. Digester
certainly doesn't have anything like this. This code does feel like it
might be at home in "lang" or "codec"...

Oliver, if there was a "digester2" project which provided a "basic" jar
that was pretty light-weight and had only optional dependencies on
commons-beanutils and on commons-logging, might you consider using that
in i18n (or even Slide) instead of the xmlio code? (And would you be
interested in helping to create digester2??).

I'm finally going to be free of the horror that is my current job in
December, and plan to spend a fair chunk of January getting a Digester2
up and running (assuming that Craig/Robert et al are happy with that).
Even if xmlio goes ahead, and the i18n component uses it I will still
keep its features in mind when working on Digester2.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message