commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Hooper <a...@adamhooper.com>
Subject Re: [LANG] Clarification of method behavior in StringEscapeUtils
Date Mon, 03 Feb 2014 13:13:22 GMT
On Sun, Feb 2, 2014 at 2:00 PM, Benedikt Ritter <britter@apache.org> wrote:
>
> 2014-02-01 Gary Gregory <garydgregory@gmail.com>:
>
>> On Sat, Feb 1, 2014 at 9:12 AM, Benedikt Ritter <britter@apache.org>
>> wrote:
>>
>> >
>> > These methods only escape the basic xml/html entities, though they may
>> > produce invalid XML/HTML. LANG-955 [1] proposes to add new methods that
>> > only produce valid XML, they should throw an exception if a character is
>> > encountered that cannot be displayed in XML (not even by escaping).
>>
>> How does that the problem mentioned earlier on the ML of needing valid XML
>> no matter what the input?
>>
>
> I don't understand that sentence, sorry :o)

As the author of that patch, my two pence:

It's impossible to encode some characters in XML -- especially XML
1.0. That's because XML is a text-only format, so it only allows text.
(This inspired Microsoft, when it created its XML document formats, to
invent a new encoding scheme ("xstring", I think) that uses valid XML
characters to encode invalid ones. Luckily, that encoding scheme never
caught on outside of Microsoft-land.)

While there's nothing _wrong_ with escapeXml as it stands right now
(i.e., the code agrees with the docs), I argue that it doesn't solve
the actual problem people are using it for: people want to escape
strings for inclusion in XML documents, and escapeXml does not do
that.

I think escapeXml should not output invalid XML ever.

Presumably encodeXml() is being used today for lots of XML documents,
and it already throws a brutal exception: a valid XML parser will
throw an exception when it reaches an invalid character. That speaks
to the severity of the problem (it makes that data very hard to get
at), and to the rarity of the problem (there haven't been many bug
reports about this).

>> There are several tasks for the API(s):
>>
>> - Escaping (implied by the API name)
>> - Dealing with non-XML chars:
>>   o Strip, or
>>   o Throw exception
>>
>> The simplest solution using today's style would be:
>>
>> escapeXml10(String text, boolean strip)
>> escapeXml11(String text, boolean strip)
>>
>> strip true - strips
>> strip false - throws exception
>>
>
> A boolean flag that controls whether a method throws an exception or not?
> An exceptional situation is nothing that is configurable, imho.
>
>> What I am not sure on is why you would want an exception or what you'd do
>> with it.
>>
>> Are these 'bad chars' embeddable in a CDATA? If so, strip false makes sense
>> because we really cannot handle the text. But what would the app then do
>> with the exception?

I originally thought an exception would be useful, but I changed my
mind as I wrote the patch. Some reasons:

* What kind of exception? It isn't really an IOException, and the API
doesn't seem keen on adding other kinds.

* What would the user want to do with it? Re-run the operation in its
exception-free incarnation?

An exception might be useful for some people, but I think it would be
right to steer those people towards a different API -- maybe not a
part of commons-io.

Enjoy life,
Adam

-- 
My Phone (mobile): +1 613 986 3339
My Website: http://adamhooper.com
My Twitter: http://twitter.com/adamhooper

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message