commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <garydgreg...@gmail.com>
Subject Re: [LANG] Clarification of method behavior in StringEscapeUtils
Date Sat, 01 Feb 2014 15:27:53 GMT
On Sat, Feb 1, 2014 at 9:12 AM, Benedikt Ritter <britter@apache.org> wrote:

> Hi,
>
> right now we have the following methods in StringEscapeUtils:
>
> escapeXml(String
> escapeHtml3(String)
> escapeHtml4(String)
>
> These methods only escape the basic xml/html entities, though they may
> produce invalid XML/HTML. LANG-955 [1] proposes to add new methods that
> only produce valid XML, they should throw an exception if a character is
> encountered that cannot be displayed in XML (not even by escaping).
>

How does that the problem mentioned earlier on the ML of needing valid XML
no matter what the input?

There are several tasks for the API(s):

- Escaping (implied by the API name)
- Dealing with non-XML chars:
  o Strip, or
  o Throw exception

The simplest solution using today's style would be:

escapeXml10(String text, boolean strip)
escapeXml11(String text, boolean strip)

strip true - strips
strip false - throws exception

What I am not sure on is why you would want an exception or what you'd do
with it.

Are these 'bad chars' embeddable in a CDATA? If so, strip false makes sense
because we really cannot handle the text. But what would the app then do
with the exception? I am not sure that I want the extra logic. Presumably,
if I am not using JAXB then I am doing my own "looser" XML IO, so I need to
escape content... I wonder what JAXB does here...


>
> Since the set of valid characters differs between XML 1.0 and XML 1.1, we
> need two methods:
>
> escapeXml_1_0(String)
> escapeXml_1_1(String)
>

Yuck! Underscores are of last resort.

Simple alternatives

escapeXml10
escapeXml11
escapeXmlV10
escapeXmlV11

Until we get to XML version 10, this will be fine.

Precise alternatives:

escapeXml10_20081126 (the W3C REC for XML 1.0 *5th edition* is is
http://www.w3.org/TR/2008/REC-xml-20081126/)
escapeXml10_20060816 (the W3C REC for XML 1.0 *4th edition* is is
http://www.w3.org/TR/2008/REC-xml-20060816/)
escapeXml10_20040204 (the W3C REC for XML 1.0 *3th edition* is is
http://www.w3.org/TR/2008/REC-xml-20040204/)

Or use a "E" or "e" for Edition instead of _
escapeXml10E20081126
escapeXml10e20081126

Each edition may have several versions BTW.


>
> To clarify the behavior of the old method I've created LANG-963 [2]. The
> idea is to rename escapeXml(String) to escapeXmlEntities(String) and
> deprecate the old method.
>
> Now I'm tempted to rename the HTML counterparts as well leading to either
> of the following:
>
> escapeHtml3Entities(String)
> escapeHtml4Entities(String)
>
> or:
>
> escapeHtml_3_Entities(String)
> escapeHtml_4_Entities(String)
>
> or:
>
> escapeHtml_3_0_Entities(String)
> escapeHtml_4_0_Entities(String)
>
> I find neither of the three very appealing, but for code symmetry we should
> change this as well. Which one would you prefer?
>
> Benedikt
>
> P.S.: I'm planning to redesign great parts of the API. The "static util"
> pattern is out dated and it is better to encode the information we're
> trying to express here via fluent API. My proposal for lang 4.0 would be:
>
> StringEscaping.escape(str).with(Escaping.HTML_4_0)
> StringEscaping.escape(str).with(Escaping.XML_ENTITIES)
>

Gross, don't force an API style on me, Java is verbose enough as it is. For
those in love with fluent APIs, you can provide an separate code path I
suppose. I'd rather not deal with it for low level util call sites. I am
not building an object model here.

Now that Java 8 lambdas are here, the style will change again.


>
> This way we don't have to encode everything into method names.


You still can use parameters. But first we need to decide on
strip/exception policies.

Gary



> I've created
> LANG-964 [3] for this.
>
> [1] https://issues.apache.org/jira/browse/LANG-955
> [2] https://issues.apache.org/jira/browse/LANG-963
> [3] https://issues.apache.org/jira/browse/LANG-964
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>



-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message