commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremias Maerki <...@jeremias-maerki.ch>
Subject Re: Proposal: Commons SAX
Date Wed, 17 Dec 2008 14:02:35 GMT
FOP basically does all its XML parsing based on SAX. I don't see too
much overlap right now (without looking more closely). But maybe someone
can see something worth copying/moving in such a Commons SAX. A
colleague of mine had to implement something like a TeeContentHandler a
few weeks ago. So I guess there are use cases for things like that.

Here's a link to the SAX utility classes we have in FOP if anyone is
interested:
http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/org/apache/fop/util/

Relevant classes:
ContentHandlerFactory (used for handling embedded non-FO documents)
ContentHandlerFactoryRegistry
DOMBuilderContentHandlerFactory
DelegatingContentHandler
DOM2SAX (adapted from Xalan)

That was mostly the parsing side, but generating XML by generating SAX
events can also be very interesting and most of all damn fast and
versatile. Pure SAX method calls for generating XML are very verbose and
in many cases unnecessary. FOP's example code contains something I wrote
a long time ago.
http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/examples/embedding/java/embedding/tools/

I'm doing that differently today but I'm still not 100% happy. I've
written some new code in a dev branch I'm currently working in:
http://svn.apache.org/viewvc/xmlgraphics/fop/branches/Temp_AreaTreeNewDesign/src/java/org/apache/fop/util/
Relevant classes:
GenerationHelperContentHandler
XMLConstants
XMLUtil

Example using the above classes:
http://svn.apache.org/viewvc/xmlgraphics/fop/branches/Temp_AreaTreeNewDesign/src/sandbox/org/apache/fop/render/svg/SVGPainter.java?view=markup

In XML Graphics Commons, there's the interface XMLizable:
http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/util/XMLizable.java?view=markup
which I've used a number of times. Basically, I've stolen that from
Excalibur/Cocoon:
http://excalibur.apache.org/apidocs/org/apache/excalibur/xml/sax/XMLizable.html
Says by itself that this is obviously useful.

Another handy utility is javax.xml.namespace.QName which is found
starting in Java 1.5. Since XML Graphics is still on Java 1.4 I've
written a similar class:
http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/util/QName.java?view=markup

So, I'm just listing what comes to my mind. No idea what is useful. At
any rate, the Commons SAX idea is a good one IMO. I'd be happy to help
out depending on the scope (i.e. for what there is room).

On 17.12.2008 14:09:00 Jukka Zitting wrote:
> Hi,
> 
> In the Apache Tika project [1] we use SAX quite a lot, and have
> written a set of quite useful general utility classes for SAX
> handling.
> 
> For example, in org.apache.tika.sax [2] we have the following:
> 
> * ContentHandlerDecorator - Convenient base class for writing
> ContentHandler decorators
> * EmbeddedContentHandler - Decorator that blocks startDocument() and
> endDocument() calls
> * TeeContentHandler - Forwards SAX events to multiple handlers
> * TextContentHandler - Decorator that blocks everything but character
> events (and start/endDocument)
> * WriteOutContentHandler - Writes the contents of all character events
> to a Writer
> 
> In org.apache.tika.sax.xpath [3] we have a simple XPath subset
> implementation that supports streaming and filtering of SAX events. In
> other words, the implementation doesn't need a DOM tree to evaluate
> XPath statements.
> 
> I believe this code would be useful also outside Tika, and I was
> thinking that it might perhaps make sense to create a Commons project
> for this. I also know of some SAX processing classes in Cocoon and
> Jackrabbit that could well be of interest to a wider audience.
> 
> Do you think something like this would be interesting as a Commons
> project? Are there other similar efforts that I should know of? I
> looked at XML Commons in xml.apache.org, but it seems pretty dormant.
> 
> [1] http://lucene.apache.org/tika/
> [2] http://lucene.apache.org/tika/apidocs/org/apache/tika/sax/package-summary.html
> [3] http://lucene.apache.org/tika/apidocs/org/apache/tika/sax/xpath/package-summary.html
> 
> BR,
> 
> Jukka Zitting
> 



Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message