commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <>
Subject Re: Digester's pattern matching: bug or misunderstanding?
Date Mon, 19 Jan 2004 22:22:43 GMT
Hi Olaf,

On Tue, 2004-01-20 at 07:17, wrote:
> I'm currently writing an XML config file parser using the Digester framework. I've spent
quite a while debugging the digester code, since I really did not understand how digester
was building the object stack. I now understand it but am wondering if this is a bug ...
> Here are the details. My XML document roughly has the following structure (strongly simplified
for demonstration purposes):
> <my-sender>
> 	<managed-object>
> 		<descriptor/>
> 	</managed-object>
> </my-sender>
> <my-sender>
> 	<multi-dimension-object>
> 		<managed-object>
> 			<descriptor/>
> 		</managed-object>
> 		<other-dimension>
> 			<descriptor/>
> 		</other-dimension>
> 	</multi-dimension-object>
> </my-sender>
> I now have added the following rules:
> 	digester.addObjectCreate("*/my-sender", Sender.Class);
> 	digester.addObjectCreate("*/managed-object", ManagedObject.Class);
> 	digester.addObjectCreate("*/multi-dimension", MultiDimensionObject.Class);
> 	digester.addObjectCreate("*/descriptor", Descriptor.Class);
> 	digester.setNextRule("*/managed-object/descriptor", "setDescriptor");
> 	digester.setNextRule("*/other-dimension/descriptor", "doSomethingElse");
> With this setup the "Descriptor" object is *NOT* created, since the rules associated
with the longer
> "*/managed-object/descriptor" or "*/other-dimension/descriptor" matches are executed
before the ObjectCreateRule 
> which is matched by "*/descriptor".

The exact behaviour depends on which Rules class is being used to do the
pattern matching (see Digester#setRules). I presume you are using the
default Rules class (RulesBase). I'm not sure how the other Rules
classes would handle this situation..

Actually (for the default rules class), the longer match is called
*instead* of the shorter one. In your example, the ObjectCreateRule for
Descriptor.class won't be fired at all because "*/descriptor" is not the
best matching pattern for the node.

All rules associated with a specific pattern are grouped together. The
rules associated with the best-matching pattern are executed, and all
other rules are ignored even if they are associated with a pattern that
would also match.

  "*/bar" --> Rule "a"
  "*/bar" --> Rule "b"
  "foo/bar" --> Rule "c"
  "foo/bar" --> Rule "d"

For a "foo/bar" node, rules "c" and "d" are executed in that order, and
rules "a" and "b" are not executed at all because "*/bar" is not the
best-matching (most precise) pattern.

> Digester now calls the "setDescriptor" method on a Sender instance instead of on a ManagedObject

Yep. Because the Descriptor object was never created (the
ObjectCreateRule never fired), this will happen.

> I would expect Digester to create the "Descriptor" object before doing anything else
with the object. 
> It seems that digester does not handle rules in the proper order. Some rules are clearly
> with the creation (invoked in start() function) of an element (e.g. ObjectCreateRule).
Other rules 
> (e.g. SetNextRule)  should be invoked after object creation (invoked in end() function).
> digester does not seem to take this into account.

The digester always preserves the order in which rules were added to the
digester. Every rule class has a begin(), body() and end() method.

The begin() method of each matched rule is executed in order they were
added. The body() method of each matched rule is executed in order they
were added. The end() method of each matched rule is executed in reverse
order they were added (to ensure proper stack-based behaviour). 

Some rules do their work in begin(), a few in body(), and some in end();
occasionally you need to be aware of this. However the order is always

As noted above, your problem is not related to the ordering of the
ObjectCreateRule vs the SetNextRule; the ObjectCreateRule is *never*
being fired.

> Is this a bug, or do I need more clarification on how the matching patterns really 
> influence digester's behavior? Comments are welcome.

I recommend replacing

digester.addObjectCreate("*/descriptor", Descriptor.Class);

with this:

digester.addObjectCreate("*/managed-object/descriptor", Descriptor.Class);

Yes, it probably would be nice for the default Rules engine to be able
to execute all matching patterns. However the existing approach
(matching only one pattern) allows the default Rules class to make some
significant performance optimisations. I can't see this behaviour
changing, not least because that could break existing code.

The existing approach also allows the user to set up "general" rules,
then override them with rules for more specific patterns. This wouldn't
be possible if all the matching rules ended up being executed.

I'm playing with some ideas that might go into a Digester 2.0. I'll keep
this scenario in mind. A 2.0 release would be a long way off, though, if



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message