commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edelson, Justin" <>
Subject RE: [digester] mixed content update
Date Tue, 30 Mar 2004 22:55:14 GMT
Simon - Sorry I haven't had a chance to respond to your email. I was
actually more concentrating on answering your question about use-cases,
but it sounds like I don't need to sell this need as much as I thought I
did (at least for now).

> If someone (eg Justin) is keen to work on this now, we could
potentially get it in the next release. Otherwise I suggest this could
go on the to-do list for post-1.6.

I don't know if "keen" is the right word, but I'm committed to
"digesting" mixed content XML for an application now, i.e. I need to
solve this problem one way or another (or use some other XML ingesting
mechanism, which I don't want to do for purely selfish reasons). At this
point, I'm planning on an implementation as a subclass. Whether this
subclass is accepted and put into the Digester release is dependent upon
a variety of factors, but my intention is to develop a solution either
way. I have sign-off on contributing modifications back to Apache, so
that's not an issue.

Of course, I have a high level of respect for the members of this list
(not blowing smoke, I swear) so I'm very interested in crafting the
mixed content solution based on any feedback Commons developers may

Just to be clear, is there a timeframe for 1.6?

> Justin, if you have any arguments to back your original design, please
speak up! Or if you are willing to try implementing some other approach
that doesn't involve "@text" patterns, please
> speak up too.

Let's separate the two issues in my original design - Using a special
text designator and the specific designator used. To be honest, @text
was really just a placeholder on my end. The only requirement I have for
this designator is that it be an illegal XML element name so as to
ensure that there's no conflict (i.e. if the designator was just text,
that would pose an issue if you had an element named text). My core
argument in favor of using a specific designator is that it explicitly
indicates that the pattern (i.e. /element/@text) uses different
functionality then the traditional Digester method. This is a pretty
weak argument, I'll admit. I also feel (and can't prove yet) that this
method is better performance-wise because the extra iterations over the
list of rules is over a smaller list.

As indicated above, I'm very willing to try alternate implementations,
including the interface solution you suggested.

> How do people feel about my initial proposed solution to this (as

My only concern about the interface solution (for lack of a better name)
is when you wrote 'for each rule matched by the last call to
startElement' - In my original subclassed-version, I had a call to
super.startElement(), but in order to do what you've described, I think
you'd need to replicate all the code in Digester.startElement() in the
subclasses startElement() method. Otherwise, the overridden
startElement() in the subclass would have to make an extra call to
Rules.match(). I was originally worried about having to maintain the
subclass's startElement method to reflect changes in the Digester
implementation, thus the call to super.startElement(). Is this too
dogmatic? I'm not looking to rehash the cut-and-paste vs. "eating our
own dog food" discussion recently seen in the context of [lang].

I'll have some time later in the week to take a crack at implementing
the interface solution.

Yet a third implementation that I've been thinking about would be to
take the new interface and create some additional interfaces around it -
MixedContentRules and MixedContentRuleSet. These object would basically
parallel the Rule/Rules/RuleSet interfaces. Within the
MixedContentDigester subclass, there'd be a new instance variable called
mixedContentRules. In short, the concept is that the classes that
implement MixedContentRule would be segregated from the traditional
Digester rules. The core reason for this is that I'm concerned about the
performance impact of both my original and Simon's solutions. By
segregating the rules, I've ensured that a match() call to a
MixedContentRules object only searches within MixedContentRules which
should lead to better performance.

> And if you think there are other features that could be added to
digester using "@...."-style patterns then that would also be good to
I had worked up a use-case for @comment that would allow for comments to
be digested (imagine a JavaDoc/Xdoclet-style application that read
comments out of a struts config XML file). But then I remembered that
SAX ignores comments, so this is a bigger can of worms.

I've gone ahead and submitted my test case to Bugzilla (#28068). I can
never remember how Bugzilla reacts to XML submitted in it's forms, so I
kept that out. I assume that we're all on the same page as to what
"mixed content" means, but I can easily add an example.

Thanks for the interest. I was a bit surprised at first that some were
so willing to write off Digester as just for configuration files.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message