commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Zeigermann <oliver.zeigerm...@gmail.com>
Subject Re: [digester2] Additions
Date Mon, 14 Feb 2005 16:24:54 GMT
On Tue, 15 Feb 2005 01:08:53 +1300, Simon Kitching <skitching@apache.org> wrote:
> > - DefaultRuleManager now returns an unmodifiable list upon getMatchingActions
> 
> I'm not so sure about this one.
> 
> This change means a new object is created each time getMatchingActions
> is called, and that is once for every xml element in the input document.
> And in addition, each method call is "relayed" by the UnmodifiableList
> wrapper to the underlying one, resulting in a performance hit too. Is
> this significant in the context of parsing an xml document? I don't
> know; maybe it's not measurable.
> 
> However the benefits of this change are really limited only to
> preventing bad code in SAXHandler or RuleManager classes. Cases where
> users write their own versions of those will be extremely rare. And
> mistakes in the core digester classes are, I think, likely to be picked
> up by our unit tests.
> 
> So my current feeling is that while the above change **is theoretically
> correct**, I would leave it out and return the actual list, just
> documenting that it should be treated as unmodifiable.
> 

Well, think of the situation where people inherit and do things like I
have done on my first try. So, I would be all for keeping it in, but
would not protest if you removed it.

> > - created a new FallbackRuleManager class that only takes care about
> > fallback actions
> 
> [see my comments at end first!]
> 
> Have you considered writing this class as a *decorator* rather than a
> subclass of DefaultRuleManager? If it was a decorator, it could then be
> applied to any rulemanager class, not just the default. It would be used
> as follows:
>   RuleManager realrm = new DefaultRuleManager(); // or any other type
>   RuleManager rm = new FallbackRuleManagerDecorator(realrm);
>   Digester digester = new Digester(rm);
> 
> The disadvantage would be that it would be necessary to implement
> each of the RuleManager methods in order to forward them to the
> decorated instance, together with the performance hit that would entail.
> [ah, if only this were Ruby....]
> 
> > - SupplementaryRuleManager now takes account of the explicitely
> > unmodifiable list returned by DefaultRuleManager and does a wild
> > mixture of caching and concatenating the list returned from
> > DefaultRuleManager and its own list of supplementary actions
> 
> [see my comments at end first!]
> 
> The ReadOnlyConcatList is cute. I don't think it would save any time or
> effort over
>   List list = new ArrayList(l1.size() + l2.size());
>   list.appendAll(l1);
>   list.appendAll(l2);
> but I guess the unmodifiable feature is thrown in for free.

ReadOnlyConcatList certainly would be faster and have a smaller memory
footprint.

> I've been thinking about the caching and see a mild problem. If wildcard
> pattern "a" (equivalent to "*/a" in digester1), and <a> elements are
> scattered around in various places in the input document, then the cache
> is going to grow fairly large.
> 
> So basically, whether caching improve things or not depends upon whether
> the input xml is simple and repetitive (caching wins) or reasonably
> unstructured (caching loses). And given we can't tell which is more
> likely over the set of digester users (at least I don't have a clue) I
> would probably prefer the simplest solution - not to cache.

If the cache gets too large we can easily limit its size using LRUMap
or something.
 
> I'm also thinking about the possibility that we may move to a
> RuleManager style that matches based on much richer state than just the
> current path. FOr example, if we want to support patterns like this:
>  "/foo/bar[@id='me']/baz[pos()==last()]"    [1]
> then getMatchingActions will definitely not be taking a String path any
> more, but instead an array of w3c.dom.Element objects or similar so it
> has sufficient info to decide whether the pattern matches or not. That
> won't affect many places in the code, but would make this caching
> impossible to implement so we'd have to take it out then anyway.

We can worry about this as soon as getMatchingActions takes anything
else than a String. Right now the method signature that I implemented
says it is a String, so there is no problem.

> You're probably going to hit me for this :-)

I can't, you are too far away ;)
 
> After some thought, I think my preferred option would be to include
> "fallback" and "supplementary" features as standard, ie define them in
> the RuleManager interface, provide an implementation in the
> DefaultRuleManager class, and to remove the FallbackRuleManager and
> SupplementaryRuleManager classes completely.
> 
> I've always said I'm happy for fallback features to be in
> DefaultRuleManager, and after thought I believe it ought to be defined
> as part of the main RuleManager interface. And as you're so keen for
> supplementary actions, I'm ok with adding them too as long as there is
> no performance hit for people who don't use them. Note that this would
> definitely be the easiest solution from the user point of view, with no
> special classes required in order to use "supplementary" actions.
> 
> And I would go for the simplest implementation of "supplementary"
> actions, by simply creating a new list each time getMatchingActions is
> called when one or more supplementary actions are defined, without any
> clever caching. Your efforts at performance improvements for
> supplementary actions are impressive, but it's not clear to me that they
> would in the end work any better than just creating new lists and
> simpler is definitely better. I don't actually think there *is* a way to

I disagree. My efforts are relatively modest and pretty obvious and
actually bring performance gains.

> implement supplementary actions without a performance penalty except to
> integrate them deeply into the concrete RuleManager's data model for
> storing rules and I would be against that just because I'm not convinced
> they will be widely used.

The way I have done it now really has very limited performance
penalty. Maybe it is even faster than in DefaultRuleManager.

> But if we have them in the main API, and we later find that people *are*
> using them often then we can think about internal optimisations at that
> point.
> 
> The result would be that DefaultRuleManager instances without any
> supplementary actions defined would work without any performance
> penalty. Users who choose to use supplementary actions will pay a
> (fairly small) penalty. Digester developers will be happy because, while
> there is extra code required in each concrete RuleManager class in order
> to support supplementary actions, the code will be simple and obvious.

My intention was to have the supplementary action part work as fast as
in xmlio. So, why not keep it the way it is? Complexity is in another
class, so Digester developers should be happy as well.

> so here's what I would see for RuleManager interface:
> 
> public interface RuleManager
> {
>   public void addFallbackAction(Action action);
>   public void addFallbackActions(List actions);
>   public void addSupplementaryAction(Action action);
>   public void addSupplementaryActions(List actions);
> }
> 
> and for DefaultRuleManager.getMatchingActions(String path) {
>    ... current stuff ...
> 
>    if (actions == null) {
>     actions = fallbackActions;
>    }
> 
>    if (supplementaryActions != null) {
>      oldActions = actions;
>      actions = new ArrayList(
>         oldActions.size() + supplementaryActions.size();
>      actions.addAll(oldActions);
>      actions.addAll(supplementaryActions);
>    }
> 
>   return actions;
> }
> 

OK, if you really want to do this, I won't stop you.

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message