commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <>
Subject Re: [Digester] how stateless is a Rules implementation supposed to be?
Date Wed, 03 Aug 2005 11:57:43 GMT
On Wed, 2005-08-03 at 12:00 +0200, Aad Nales wrote:
> Hi,
> I am trying to create cachable parsing rules for a number of Digester 
> objects that I am using in my app. From various source I have understood 
> that a Class implementing the Rules interface is supposed to be 
> stateless. Which seems very logical. Here is however what i don't get.
> On the rules Interface a 'setDigester' exists which is used to associate 
> the Rules object with the Digester. My question why? or perhaps more 
> importantly what is the effect?
> Suppose I create a Rules object holding my parser rules e.g. MyRules 
> with a an instance myRules and I call digester.setRules(myRules). My 
> guess is that some kind of call back takes place that associated this 
> digester with the myRules.
> Suppose I reuse the myRules object for digester2. What kind of effects 
> should I expect? And if none (which I guess I am sort of hoping for :-) 
> why is a RuleSet associated with a Rules object?

When a Rule object matches an element causing its begin/body/end methods
to be invoked, it needs to know the Digester object it should operate on
as the Digester has relevant data such as the object stack.

In hindsight it would be better for the begin/body/end methods to be
passed the Digester object so that the Rule objects could truly be
stateless - but that's not the way it's done. Instead each Rule object
has its setDigester method called at some point before parsing starts
and the reference is cached for later use.

The Rules.setDigester(d) method is expected to do:
 * for each rule, call rule.setDigester(d)

And exactly as you suppose, the Digester.setRules does:
    public void setRules(Rules rules) {
        this.rules = rules;

The RulesBase class is the standard implementation of the Rules
interface. I believe that apart from this setting of the Digester, the
RulesBase class is stateless. So while you can't use one concurrently
from multiple threads, it should be possible to reuse one in a sequence
of Digester parses.

Unfortunately, some of the Rule classes aren't properly stateless, even
ignoring their "digester" reference, so unless you're very careful to
limit which Rule objects are stored in the Rules object you're back to
something that can't be safely reused. It will probably work as long as
the parse is successful (Rule objects do tend to clean up their state as
long as they run successfully) but if an error occurs during parsing
some Rule objects can be left in odd states, and hence the Rules object
that manages them isn't safe to reuse either.

The RuleSet and RuleSetBase classes look at first glance as if they are
intended to provide "reusable" sets of Rule objects, but in fact they
don't. Instead, they are a way of grouping sets of Rule objects which
match elements in the same xml namespace. As the Rule objects in the set
aren't properly stateless, the RuleSet is just as dangerous to reuse as
classes implementing the Rules interface.

I'm not aware of any link between a RuleSet and a Rules object.

In short, as far as I know there is no way to achieve what you want with
Digester 1.x. Each time you parse a new document, you really need to
create a fresh set of Rule objects, and a fresh Rules object to manage
them. Where a digester parse has completed without error you *probably*
can reuse the rules and their enclosing Rules object safely; at least
I'm not aware of any particular problems though I haven't tried this
myself. On a parse failure, though, you *must* create everything fresh.

I don't know whether your mention of digester2 was a typo. As it
happens, quite a bit of work has occurred on a version 2.0 of digester
and fixing all of the above issues was one of the primary goals. The
code is in the apache subversion repository. There are a lot of other
goals for digester2, however, that have not yet been implemented so it
is a long way from an official release.

For more information you really need to download the source code and
have a look for yourself. Digester isn't a very large library and it is
pretty well commented (I think) so shouldn't be too hard to understand.

I guess that you're asking these questions because you're looking to
build a very high-performance system that's parsing many documents. In
this case, Digester might not be the best tool for you. Digester is
wonderfully flexible in the way it maps xml to java objects but its
heavy use of reflection doesn't make it the fastest solution around. For
performance some of the tools that use a precompiler to process a schema
definition and generate custom parsing code for that schema will
outperform Digester by a fair margin.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message