commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kitching <>
Subject Re: Digester trimming whitespaces
Date Sun, 03 Oct 2004 21:51:51 GMT
On Mon, 2004-10-04 at 11:33, robert burrell donkin wrote:
> >
> > I would recommend that you take a copy of the source of whatever rule 
> > is
> > causing you problems and rename the class (including changing the
> > package declaration to something in your namespace), then delete the
> > trim() call.
> i'm not sure whether this would do it.
> i suspect that what would be needed would be for an additional flag to 
> be added to digester that would pass on all calls to 
> ignorableWhitespace to characters. depending on the parser used, some 
> configuration may be necessary to ensure that the whitespace is passed 
> on to digester.

My understanding of "ignorable whitespace" is that when there is no DTD
or schema, whitespace is never ignorable; any text within an element is
reported via the "characters" callback. When there is a DTD or schema
present, and it indicates that a particular element has "element content
only" then any whitespace found in the element is reported as "ignorable
whitespace" instead of being reported via the "characters" method.

So as far as I can see, this is not relevant to Digester. If a document
has a schema/DTD and that DTD specifies that element <foo> is not
supposed to have any text within it (just child elements) then we really
don't care about whether there is whitespace present or not.

I think it might be possible for the Digester class itself to trim or
not trim the body text, instead of the individual rules doing it. But
that would then force the same "to trim or not to trim" setting to be
present for every rule, making it impossible (for example) to allow
whitespace in text within the <description> element but to ignore it
inside the <location-code> element.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message