commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <>
Subject Re: [digester2] Additions
Date Mon, 14 Feb 2005 12:08:53 GMT
Hi Oliver,

On Mon, 2005-02-14 at 08:16 +0100, Oliver Zeigermann wrote:
> [update]
> - now all member variables of DefaultRuleManager are private -
> introduced copy ctor for copying members from subclasses

thanks - that looks good to me.

> - DefaultRuleManager now returns an unmodifiable list upon getMatchingActions

I'm not so sure about this one. 

This change means a new object is created each time getMatchingActions
is called, and that is once for every xml element in the input document.
And in addition, each method call is "relayed" by the UnmodifiableList
wrapper to the underlying one, resulting in a performance hit too. Is
this significant in the context of parsing an xml document? I don't
know; maybe it's not measurable. 

However the benefits of this change are really limited only to
preventing bad code in SAXHandler or RuleManager classes. Cases where
users write their own versions of those will be extremely rare. And
mistakes in the core digester classes are, I think, likely to be picked
up by our unit tests.

So my current feeling is that while the above change **is theoretically
correct**, I would leave it out and return the actual list, just
documenting that it should be treated as unmodifiable.


> - created a new FallbackRuleManager class that only takes care about
> fallback actions

[see my comments at end first!]

Have you considered writing this class as a *decorator* rather than a
subclass of DefaultRuleManager? If it was a decorator, it could then be
applied to any rulemanager class, not just the default. It would be used
as follows:
  RuleManager realrm = new DefaultRuleManager(); // or any other type
  RuleManager rm = new FallbackRuleManagerDecorator(realrm);
  Digester digester = new Digester(rm);

The disadvantage would be that it would be necessary to implement
each of the RuleManager methods in order to forward them to the
decorated instance, together with the performance hit that would entail.
[ah, if only this were Ruby....]

> - SupplementaryRuleManager now takes account of the explicitely
> unmodifiable list returned by DefaultRuleManager and does a wild
> mixture of caching and concatenating the list returned from
> DefaultRuleManager and its own list of supplementary actions

[see my comments at end first!]

The ReadOnlyConcatList is cute. I don't think it would save any time or
effort over
  List list = new ArrayList(l1.size() + l2.size());
but I guess the unmodifiable feature is thrown in for free.

I've been thinking about the caching and see a mild problem. If wildcard
pattern "a" (equivalent to "*/a" in digester1), and <a> elements are
scattered around in various places in the input document, then the cache
is going to grow fairly large.

So basically, whether caching improve things or not depends upon whether
the input xml is simple and repetitive (caching wins) or reasonably
unstructured (caching loses). And given we can't tell which is more
likely over the set of digester users (at least I don't have a clue) I
would probably prefer the simplest solution - not to cache.

I'm also thinking about the possibility that we may move to a
RuleManager style that matches based on much richer state than just the
current path. FOr example, if we want to support patterns like this:
 "/foo/bar[@id='me']/baz[pos()==last()]"    [1]
then getMatchingActions will definitely not be taking a String path any
more, but instead an array of w3c.dom.Element objects or similar so it
has sufficient info to decide whether the pattern matches or not. That
won't affect many places in the code, but would make this caching
impossible to implement so we'd have to take it out then anyway.

[1] sorry my xpath is a bit rusty - this is probably not valid

> - Both FallbackRuleManager and SupplementaryRuleManager now can have
> several callback actions
> - Still docs and tests need improvement - will do so once the code
> seems somewhat stable
> Comments?

You're probably going to hit me for this :-)

After some thought, I think my preferred option would be to include
"fallback" and "supplementary" features as standard, ie define them in
the RuleManager interface, provide an implementation in the
DefaultRuleManager class, and to remove the FallbackRuleManager and
SupplementaryRuleManager classes completely.

I've always said I'm happy for fallback features to be in
DefaultRuleManager, and after thought I believe it ought to be defined
as part of the main RuleManager interface. And as you're so keen for
supplementary actions, I'm ok with adding them too as long as there is
no performance hit for people who don't use them. Note that this would
definitely be the easiest solution from the user point of view, with no
special classes required in order to use "supplementary" actions.

And I would go for the simplest implementation of "supplementary"
actions, by simply creating a new list each time getMatchingActions is
called when one or more supplementary actions are defined, without any
clever caching. Your efforts at performance improvements for
supplementary actions are impressive, but it's not clear to me that they
would in the end work any better than just creating new lists and
simpler is definitely better. I don't actually think there *is* a way to
implement supplementary actions without a performance penalty except to
integrate them deeply into the concrete RuleManager's data model for
storing rules and I would be against that just because I'm not convinced
they will be widely used. 

But if we have them in the main API, and we later find that people *are*
using them often then we can think about internal optimisations at that

The result would be that DefaultRuleManager instances without any
supplementary actions defined would work without any performance
penalty. Users who choose to use supplementary actions will pay a
(fairly small) penalty. Digester developers will be happy because, while
there is extra code required in each concrete RuleManager class in order
to support supplementary actions, the code will be simple and obvious.

so here's what I would see for RuleManager interface:

public interface RuleManager
  public void addFallbackAction(Action action);
  public void addFallbackActions(List actions);
  public void addSupplementaryAction(Action action);
  public void addSupplementaryActions(List actions);

and for DefaultRuleManager.getMatchingActions(String path) {
   ... current stuff ...

   if (actions == null) {
    actions = fallbackActions;

   if (supplementaryActions != null) {
     oldActions = actions;
     actions = new ArrayList(
	oldActions.size() + supplementaryActions.size();

  return actions;

It would be tempting to put the fallback/supplementary login in
AbstractRuleManager but I think that would be a mistake. The point of
that abstract class is *not* to provide any default behaviour, but
simply to provide ourselves with a place to put hooks for "backward
compatibility" if we change the RuleManager interface. If we put logic
in here, we tempt people who don't want our version of the logic to
implement the RuleManager interface instead of the AbstractRuleManager
class, which we really don't want to encourage.

Over to you Oliver....



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message