commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <GGreg...@seagullsoftware.com>
Subject LANG-728 to work with Lang 3.0 way of using escapeXml with > 0x7f characters [WAS RE: svn commit: r1148162 - /commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java]
Date Tue, 19 Jul 2011 14:28:15 GMT
Hi All:

I am glad to know there is a 3.0 way of doing that, which is:

    @Test
    public void testEscapeXmlSupplementaryCharacters() {
        CharSequenceTranslator escapeXml = 
            StringEscapeUtils.ESCAPE_XML.with( NumericEntityEscaper.between(0x7f, Integer.MAX_VALUE)
);

        assertEquals("Supplementary character must be represented using a single escape",
"&#144308;",
                escapeXml.translate("\uD84C\uDFB4"));

 but what about the test the way it was originally written?

        // Example from https://issues.apache.org/jira/browse/LANG-728
        assertEquals("Supplementary character must be represented using a single escape",
"&#144308;",
                StringEscapeUtils.escapeXml("\uD84C\uDFB4"));
        // Example from See http://www.w3.org/International/questions/qa-escapes
        assertEquals("Supplementary character must be represented using a single escape",
"&#x233B4;",
                StringEscapeUtils.escapeXml("\uD84C;\uDFB4;"));

It still fails. 

Shouldn't the API be changed to work for this case too? The W3C seems to say so: "you must
use the single, code point value for that character" in:

     * From http://www.w3.org/International/questions/qa-escapes
     * </p>
     * <blockquote>
     * Supplementary characters are those Unicode characters that have code points higher
than the characters in
     * the Basic Multilingual Plane (BMP). In UTF-16 a supplementary character is encoded
using two 16-bit surrogate code points from the
     * BMP. Because of this, some people think that supplementary characters need to be represented
using two escapes, but this is incorrect
     * – you must use the single, code point value for that character. For example, use
&#x233B4; rather than &#xD84C;&#xDFB4;.
     * </blockquote>

Gary

-----Original Message-----
From: bayard@apache.org [mailto:bayard@apache.org] 
Sent: Tuesday, July 19, 2011 0:58 AM
To: commits@commons.apache.org
Subject: svn commit: r1148162 - /commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java

Author: bayard
Date: Tue Jul 19 04:58:03 2011
New Revision: 1148162

URL: http://svn.apache.org/viewvc?rev=1148162&view=rev
Log:
Updating unit test for LANG-728 to work with Lang 3.0 way of using escapeXml with > 0x7f
characters

Modified:
    commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java

Modified: commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java
URL: http://svn.apache.org/viewvc/commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java?rev=1148162&r1=1148161&r2=1148162&view=diff
==============================================================================
--- commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java
(original)
+++ commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/Str
+++ ingEscapeUtilsTest.java Tue Jul 19 04:58:03 2011
@@ -31,6 +31,9 @@ import org.apache.commons.io.IOUtils;  import org.junit.Ignore;  import
org.junit.Test;
 
+import org.apache.commons.lang3.text.translate.CharSequenceTranslator;
+import org.apache.commons.lang3.text.translate.UnicodeEscaper;
+
 /**
  * Unit tests for {@link StringEscapeUtils}.
  *
@@ -333,15 +336,13 @@ public class StringEscapeUtilsTest {
      * @see <a href="http://www.w3.org/International/questions/qa-escapes">Using character
escapes in markup and CSS</a>
      * @see <a href="https://issues.apache.org/jira/browse/LANG-728">LANG-728</a>
      */
-    @Ignore
     @Test
     public void testEscapeXmlSupplementaryCharacters() {
-        // Example from https://issues.apache.org/jira/browse/LANG-728
-        assertEquals("Supplementary character must be represented using a single escape",
"&#144308;",
-                StringEscapeUtils.escapeXml("\uD84C\uDFB4"));
-        // Example from See http://www.w3.org/International/questions/qa-escapes
-        assertEquals("Supplementary character must be represented using a single escape",
"&#x233B4;",
-                StringEscapeUtils.escapeXml("\uD84C;\uDFB4;"));
+        CharSequenceTranslator escapeXml = 
+            StringEscapeUtils.ESCAPE_XML.with( 
+ UnicodeEscaper.between(0x7f, Integer.MAX_VALUE) );
+
+        assertEquals("Supplementary character must be represented using a single escape",
"\u233B4",
+                escapeXml.translate("\uD84C\uDFB4"));
     }
     
     // Tests issue #38569


Mime
View raw message