commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <>
Subject Re: [LANG] Clarification of method behavior in StringEscapeUtils
Date Sat, 01 Feb 2014 15:27:53 GMT
On Sat, Feb 1, 2014 at 9:12 AM, Benedikt Ritter <> wrote:

> Hi,
> right now we have the following methods in StringEscapeUtils:
> escapeXml(String
> escapeHtml3(String)
> escapeHtml4(String)
> These methods only escape the basic xml/html entities, though they may
> produce invalid XML/HTML. LANG-955 [1] proposes to add new methods that
> only produce valid XML, they should throw an exception if a character is
> encountered that cannot be displayed in XML (not even by escaping).

How does that the problem mentioned earlier on the ML of needing valid XML
no matter what the input?

There are several tasks for the API(s):

- Escaping (implied by the API name)
- Dealing with non-XML chars:
  o Strip, or
  o Throw exception

The simplest solution using today's style would be:

escapeXml10(String text, boolean strip)
escapeXml11(String text, boolean strip)

strip true - strips
strip false - throws exception

What I am not sure on is why you would want an exception or what you'd do
with it.

Are these 'bad chars' embeddable in a CDATA? If so, strip false makes sense
because we really cannot handle the text. But what would the app then do
with the exception? I am not sure that I want the extra logic. Presumably,
if I am not using JAXB then I am doing my own "looser" XML IO, so I need to
escape content... I wonder what JAXB does here...

> Since the set of valid characters differs between XML 1.0 and XML 1.1, we
> need two methods:
> escapeXml_1_0(String)
> escapeXml_1_1(String)

Yuck! Underscores are of last resort.

Simple alternatives


Until we get to XML version 10, this will be fine.

Precise alternatives:

escapeXml10_20081126 (the W3C REC for XML 1.0 *5th edition* is is
escapeXml10_20060816 (the W3C REC for XML 1.0 *4th edition* is is
escapeXml10_20040204 (the W3C REC for XML 1.0 *3th edition* is is

Or use a "E" or "e" for Edition instead of _

Each edition may have several versions BTW.

> To clarify the behavior of the old method I've created LANG-963 [2]. The
> idea is to rename escapeXml(String) to escapeXmlEntities(String) and
> deprecate the old method.
> Now I'm tempted to rename the HTML counterparts as well leading to either
> of the following:
> escapeHtml3Entities(String)
> escapeHtml4Entities(String)
> or:
> escapeHtml_3_Entities(String)
> escapeHtml_4_Entities(String)
> or:
> escapeHtml_3_0_Entities(String)
> escapeHtml_4_0_Entities(String)
> I find neither of the three very appealing, but for code symmetry we should
> change this as well. Which one would you prefer?
> Benedikt
> P.S.: I'm planning to redesign great parts of the API. The "static util"
> pattern is out dated and it is better to encode the information we're
> trying to express here via fluent API. My proposal for lang 4.0 would be:
> StringEscaping.escape(str).with(Escaping.HTML_4_0)
> StringEscaping.escape(str).with(Escaping.XML_ENTITIES)

Gross, don't force an API style on me, Java is verbose enough as it is. For
those in love with fluent APIs, you can provide an separate code path I
suppose. I'd rather not deal with it for low level util call sites. I am
not building an object model here.

Now that Java 8 lambdas are here, the style will change again.

> This way we don't have to encode everything into method names.

You still can use parameters. But first we need to decide on
strip/exception policies.


> I've created
> LANG-964 [3] for this.
> [1]
> [2]
> [3]
> --

E-Mail: |
Java Persistence with Hibernate, Second Edition<>
JUnit in Action, Second Edition <>
Spring Batch in Action <>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message