commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Bushik (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LANG-859) org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification
Date Wed, 21 Nov 2012 16:21:58 GMT

     [ https://issues.apache.org/jira/browse/LANG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Bushik updated LANG-859:
-------------------------------

    Description: 
According to specification of XML version 1.0 there are Unicode characters that are not allowed
in the content of the XML document http://www.w3.org/TR/xml/#charsets
StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>;
or &#<dec-code>;

<pre>
public static void main(String[] args) throws Exception {
    String xmlValidText = "good";
    // Passes assertion
    assertEquals(StringEscapeUtils.escapeXml("good"), "good");
    
    char xmlInvalidChar = (char) 0x2;
    String xmlInvalidText = String.valueOf(xmlInvalidChar);
    // Fails assertion
    assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
    
    System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
    String xml =
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
            "<chars>" +
            "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>"
+
            "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>"
+
            "</chars>";
    // An invalid XML character (Unicode: 0x2) was found in the element content of the document
    Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new
ByteArrayInputStream(xml.getBytes("UTF-8")));
    System.out.println(document);
}
</pre>

  was:
According to specification of XML version 1.0 there are Unicode characters that are not allowed
in the content of the XML document http://www.w3.org/TR/xml/#charsets
StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>;
or &#<dec-code>;

public static void main(String[] args) throws Exception {
    String xmlValidText = "good";
    // Passes assertion
    assertEquals(StringEscapeUtils.escapeXml("good"), "good");
    
    char xmlInvalidChar = (char) 0x2;
    String xmlInvalidText = String.valueOf(xmlInvalidChar);
    // Fails assertion
    assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
    
    System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
    String xml =
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
            "<chars>" +
            "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>"
+
            "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>"
+
            "</chars>";
    // An invalid XML character (Unicode: 0x2) was found in the element content of the document
    Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new
ByteArrayInputStream(xml.getBytes("UTF-8")));
    System.out.println(document);
}

    
> org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered
invalid according to W3C specification
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LANG-859
>                 URL: https://issues.apache.org/jira/browse/LANG-859
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Sergey Bushik
>
> According to specification of XML version 1.0 there are Unicode characters that are not
allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
> StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>;
or &#<dec-code>;
> <pre>
> public static void main(String[] args) throws Exception {
>     String xmlValidText = "good";
>     // Passes assertion
>     assertEquals(StringEscapeUtils.escapeXml("good"), "good");
>     
>     char xmlInvalidChar = (char) 0x2;
>     String xmlInvalidText = String.valueOf(xmlInvalidChar);
>     // Fails assertion
>     assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
>     
>     System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
>     String xml =
>             "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>             "<chars>" +
>             "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>"
+
>             "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>"
+
>             "</chars>";
>     // An invalid XML character (Unicode: 0x2) was found in the element content of the
document
>     Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new
ByteArrayInputStream(xml.getBytes("UTF-8")));
>     System.out.println(document);
> }
> </pre>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message