commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Hooper (JIRA)" <>
Subject [jira] [Created] (LANG-955) StringEscapeUtils.escapeXml doesn't remove invalid characters
Date Tue, 21 Jan 2014 14:53:19 GMT
Adam Hooper created LANG-955:

             Summary: StringEscapeUtils.escapeXml doesn't remove invalid characters
                 Key: LANG-955
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.*
    Affects Versions: 3.1
         Environment: Ubuntu 13.10
            Reporter: Adam Hooper

escapeXml lets non-text characters pass through into XML files:

scala> org.apache.commons.lang3.StringEscapeUtils.escapeXml("\u0004").codePointAt(0)
res4: Int = 4

I would expect the result to be an exception -- either from StringEscapeUtils (refusing to
encode it) or, preferably, from String.codePointAt, complaining that the string is empty.
\u0004 is not a valid character in XML 1.0, and there is no way to represent it in an XML
document -- not even by escaping it.

Wikipedia summarizes the characters that are not allowed in XML -- even after escaping:
The reason for disallowing them: XML is a text interchange format, and control characters
are not text.

If StringEscapeUtils.escapeXml allows invalid XML characters through -- whether escaped or
not -- it generates invalid XML. Valid XML parsers will refuse to read such files.

This message was sent by Atlassian JIRA

View raw message