commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert burrell donkin <robertburrelldon...@blueyonder.co.uk>
Subject Re: Digester trimming whitespaces
Date Thu, 07 Oct 2004 20:59:37 GMT

On 3 Oct 2004, at 22:51, Simon Kitching wrote:

> On Mon, 2004-10-04 at 11:33, robert burrell donkin wrote:
>>>
>>> I would recommend that you take a copy of the source of whatever rule
>>> is
>>> causing you problems and rename the class (including changing the
>>> package declaration to something in your namespace), then delete the
>>> trim() call.
>>
>> i'm not sure whether this would do it.
>>
>> i suspect that what would be needed would be for an additional flag to
>> be added to digester that would pass on all calls to
>> ignorableWhitespace to characters. depending on the parser used, some
>> configuration may be necessary to ensure that the whitespace is passed
>> on to digester.
>
> My understanding of "ignorable whitespace" is that when there is no DTD
> or schema, whitespace is never ignorable; any text within an element is
> reported via the "characters" callback. When there is a DTD or schema
> present, and it indicates that a particular element has "element 
> content
> only" then any whitespace found in the element is reported as 
> "ignorable
> whitespace" instead of being reported via the "characters" method.
>
> So as far as I can see, this is not relevant to Digester. If a document
> has a schema/DTD and that DTD specifies that element <foo> is not
> supposed to have any text within it (just child elements) then we 
> really
> don't care about whether there is whitespace present or not.

+1

i've had a poke around and i recon that you're probably right on this 
one.

> I think it might be possible for the Digester class itself to trim or
> not trim the body text, instead of the individual rules doing it. But
> that would then force the same "to trim or not to trim" setting to be
> present for every rule, making it impossible (for example) to allow
> whitespace in text within the <description> element but to ignore it
> inside the <location-code> element.

+1

on reflection, i'd probably support added a property (to allow trimming 
or not) or (alternative) a post processing hook for a subclass to those 
rules that trim.

- robert


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message