commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert burrell donkin <>
Subject Re: Digester trimming whitespaces
Date Thu, 07 Oct 2004 20:59:37 GMT

On 3 Oct 2004, at 22:51, Simon Kitching wrote:

> On Mon, 2004-10-04 at 11:33, robert burrell donkin wrote:
>>> I would recommend that you take a copy of the source of whatever rule
>>> is
>>> causing you problems and rename the class (including changing the
>>> package declaration to something in your namespace), then delete the
>>> trim() call.
>> i'm not sure whether this would do it.
>> i suspect that what would be needed would be for an additional flag to
>> be added to digester that would pass on all calls to
>> ignorableWhitespace to characters. depending on the parser used, some
>> configuration may be necessary to ensure that the whitespace is passed
>> on to digester.
> My understanding of "ignorable whitespace" is that when there is no DTD
> or schema, whitespace is never ignorable; any text within an element is
> reported via the "characters" callback. When there is a DTD or schema
> present, and it indicates that a particular element has "element 
> content
> only" then any whitespace found in the element is reported as 
> "ignorable
> whitespace" instead of being reported via the "characters" method.
> So as far as I can see, this is not relevant to Digester. If a document
> has a schema/DTD and that DTD specifies that element <foo> is not
> supposed to have any text within it (just child elements) then we 
> really
> don't care about whether there is whitespace present or not.


i've had a poke around and i recon that you're probably right on this 

> I think it might be possible for the Digester class itself to trim or
> not trim the body text, instead of the individual rules doing it. But
> that would then force the same "to trim or not to trim" setting to be
> present for every rule, making it impossible (for example) to allow
> whitespace in text within the <description> element but to ignore it
> inside the <location-code> element.


on reflection, i'd probably support added a property (to allow trimming 
or not) or (alternative) a post processing hook for a subclass to those 
rules that trim.

- robert

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message