commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremias Maerki <>
Subject Re: Proposal: Commons SAX
Date Wed, 17 Dec 2008 14:02:35 GMT
FOP basically does all its XML parsing based on SAX. I don't see too
much overlap right now (without looking more closely). But maybe someone
can see something worth copying/moving in such a Commons SAX. A
colleague of mine had to implement something like a TeeContentHandler a
few weeks ago. So I guess there are use cases for things like that.

Here's a link to the SAX utility classes we have in FOP if anyone is

Relevant classes:
ContentHandlerFactory (used for handling embedded non-FO documents)
DOM2SAX (adapted from Xalan)

That was mostly the parsing side, but generating XML by generating SAX
events can also be very interesting and most of all damn fast and
versatile. Pure SAX method calls for generating XML are very verbose and
in many cases unnecessary. FOP's example code contains something I wrote
a long time ago.

I'm doing that differently today but I'm still not 100% happy. I've
written some new code in a dev branch I'm currently working in:
Relevant classes:

Example using the above classes:

In XML Graphics Commons, there's the interface XMLizable:
which I've used a number of times. Basically, I've stolen that from
Says by itself that this is obviously useful.

Another handy utility is javax.xml.namespace.QName which is found
starting in Java 1.5. Since XML Graphics is still on Java 1.4 I've
written a similar class:

So, I'm just listing what comes to my mind. No idea what is useful. At
any rate, the Commons SAX idea is a good one IMO. I'd be happy to help
out depending on the scope (i.e. for what there is room).

On 17.12.2008 14:09:00 Jukka Zitting wrote:
> Hi,
> In the Apache Tika project [1] we use SAX quite a lot, and have
> written a set of quite useful general utility classes for SAX
> handling.
> For example, in org.apache.tika.sax [2] we have the following:
> * ContentHandlerDecorator - Convenient base class for writing
> ContentHandler decorators
> * EmbeddedContentHandler - Decorator that blocks startDocument() and
> endDocument() calls
> * TeeContentHandler - Forwards SAX events to multiple handlers
> * TextContentHandler - Decorator that blocks everything but character
> events (and start/endDocument)
> * WriteOutContentHandler - Writes the contents of all character events
> to a Writer
> In org.apache.tika.sax.xpath [3] we have a simple XPath subset
> implementation that supports streaming and filtering of SAX events. In
> other words, the implementation doesn't need a DOM tree to evaluate
> XPath statements.
> I believe this code would be useful also outside Tika, and I was
> thinking that it might perhaps make sense to create a Commons project
> for this. I also know of some SAX processing classes in Cocoon and
> Jackrabbit that could well be of interest to a wider audience.
> Do you think something like this would be interesting as a Commons
> project? Are there other similar efforts that I should know of? I
> looked at XML Commons in, but it seems pretty dormant.
> [1]
> [2]
> [3]
> BR,
> Jukka Zitting

Jeremias Maerki

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message