commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henri Yandell <flame...@gmail.com>
Subject Re: LANG-728 to work with Lang 3.0 way of using escapeXml with > 0x7f characters [WAS RE: svn commit: r1148162 - /commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java]
Date Tue, 19 Jul 2011 16:35:30 GMT
So you're not saying that we have to escape > 0x7f (old behaviour),
but that we have to escape any supplementary characters?

Hen

On Tue, Jul 19, 2011 at 7:28 AM, Gary Gregory
<GGregory@seagullsoftware.com> wrote:
> Hi All:
>
> I am glad to know there is a 3.0 way of doing that, which is:
>
>    @Test
>    public void testEscapeXmlSupplementaryCharacters() {
>        CharSequenceTranslator escapeXml =
>            StringEscapeUtils.ESCAPE_XML.with( NumericEntityEscaper.between(0x7f,
Integer.MAX_VALUE) );
>
>        assertEquals("Supplementary character must be represented using a single escape",
"&#144308;",
>                escapeXml.translate("\uD84C\uDFB4"));
>
>  but what about the test the way it was originally written?
>
>        // Example from https://issues.apache.org/jira/browse/LANG-728
>        assertEquals("Supplementary character must be represented using a single escape",
"&#144308;",
>                StringEscapeUtils.escapeXml("\uD84C\uDFB4"));
>        // Example from See http://www.w3.org/International/questions/qa-escapes
>        assertEquals("Supplementary character must be represented using a single escape",
"&#x233B4;",
>                StringEscapeUtils.escapeXml("\uD84C;\uDFB4;"));
>
> It still fails.
>
> Shouldn't the API be changed to work for this case too? The W3C seems to say so: "you
must use the single, code point value for that character" in:
>
>     * From http://www.w3.org/International/questions/qa-escapes
>     * </p>
>     * <blockquote>
>     * Supplementary characters are those Unicode characters that have code points higher
than the characters in
>     * the Basic Multilingual Plane (BMP). In UTF-16 a supplementary character is encoded
using two 16-bit surrogate code points from the
>     * BMP. Because of this, some people think that supplementary characters need to
be represented using two escapes, but this is incorrect
>     * – you must use the single, code point value for that character. For example,
use &#x233B4; rather than &#xD84C;&#xDFB4;.
>     * </blockquote>
>
> Gary
>
> -----Original Message-----
> From: bayard@apache.org [mailto:bayard@apache.org]
> Sent: Tuesday, July 19, 2011 0:58 AM
> To: commits@commons.apache.org
> Subject: svn commit: r1148162 - /commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java
>
> Author: bayard
> Date: Tue Jul 19 04:58:03 2011
> New Revision: 1148162
>
> URL: http://svn.apache.org/viewvc?rev=1148162&view=rev
> Log:
> Updating unit test for LANG-728 to work with Lang 3.0 way of using escapeXml with >
0x7f characters
>
> Modified:
>    commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java
>
> Modified: commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java
> URL: http://svn.apache.org/viewvc/commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java?rev=1148162&r1=1148161&r2=1148162&view=diff
> ==============================================================================
> --- commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java
(original)
> +++ commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/Str
> +++ ingEscapeUtilsTest.java Tue Jul 19 04:58:03 2011
> @@ -31,6 +31,9 @@ import org.apache.commons.io.IOUtils;  import org.junit.Ignore;  import
org.junit.Test;
>
> +import org.apache.commons.lang3.text.translate.CharSequenceTranslator;
> +import org.apache.commons.lang3.text.translate.UnicodeEscaper;
> +
>  /**
>  * Unit tests for {@link StringEscapeUtils}.
>  *
> @@ -333,15 +336,13 @@ public class StringEscapeUtilsTest {
>      * @see <a href="http://www.w3.org/International/questions/qa-escapes">Using
character escapes in markup and CSS</a>
>      * @see <a href="https://issues.apache.org/jira/browse/LANG-728">LANG-728</a>
>      */
> -    @Ignore
>     @Test
>     public void testEscapeXmlSupplementaryCharacters() {
> -        // Example from https://issues.apache.org/jira/browse/LANG-728
> -        assertEquals("Supplementary character must be represented using a single
escape", "&#144308;",
> -                StringEscapeUtils.escapeXml("\uD84C\uDFB4"));
> -        // Example from See http://www.w3.org/International/questions/qa-escapes
> -        assertEquals("Supplementary character must be represented using a single
escape", "&#x233B4;",
> -                StringEscapeUtils.escapeXml("\uD84C;\uDFB4;"));
> +        CharSequenceTranslator escapeXml =
> +            StringEscapeUtils.ESCAPE_XML.with(
> + UnicodeEscaper.between(0x7f, Integer.MAX_VALUE) );
> +
> +        assertEquals("Supplementary character must be represented using a single
escape", "\u233B4",
> +                escapeXml.translate("\uD84C\uDFB4"));
>     }
>
>     // Tests issue #38569
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message