commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <>
Subject RE: [digester] mixed content update
Date Tue, 30 Mar 2004 07:26:05 GMT
On Thu, 2004-03-25 at 12:01, Simon Kitching wrote:
> On Thu, 2004-03-25 at 04:36, Edelson, Justin wrote:
> > Since coding these changes, I've rethought whether or not this needs to
> > be a subclass - since @text() can't be a legal XML element, existing
> > code shouldn't be affected. After starting to duplicate all of the
> > existing Digester test cases to run against my new subclass, I realized
> > this was more copy and paste then I really wanted to do and modified the
> > current from CVS and all of the test cases passed.
> > 
> > I'd like to go ahead and submit the patches & test cases into bugzilla,
> > but would like to get a reading on whether or not my original assumption
> > was correct - is a new subclass (MixedContentDigester) more or less
> > likely to get submitted to CVS then a patch to Digester - all other
> > things (documentation, unit tests, the code itself)?

I originally wasn't too keen on this idea, but it has been niggling away
in the back of my mind for the last week now.

Justin is quite correct that Digester can't currently handle this:
  <p>Hi, this is an <i>example</i> of some <b>bold</b>text.</p>

The best it can currently do is to build this tree:
  Hi, this is an of some text.

I think this is something that it would be nice to handle. I'm still
rather negative on the proposed solution to add "@text" to digester
patterns to trigger special processing.

How do people feel about my initial proposed solution to this (as

> Here's a suggestion for an alternative implementation. I haven't thought
> this through deeply, so it may be broken. But it would avoid having
> special pattern strings. This suggestion is just intended to stir the
> pot of potential solutions :-)
>   Define an interface called MixedContentRule or similar which rule
>   classes can implement. 
>   In Digester's startElement method:
>     if bodytext not null:
>       for each rule matched by the last call to startElement:
>         if rule implements MixedContentRule
>           call that rule's content(bodytext) method
> The effect should be that any rule which implements the MixedContentRule
> interface (and therefore has an extra content(String) method) gets its
> content method called whever there is a piece of text followed by a
> nested element.

Justin, if you have any arguments to back your original design, please
speak up! Or if you are willing to try implementing some other approach
that doesn't involve "@text" patterns, please speak up too.

If someone (eg Justin) is keen to work on this now, we could potentially
get it in the next release. Otherwise I suggest this could go on the
to-do list for post-1.6.




To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message