commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henri Yandell (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LANG-480) StringEscapeUtils.escapeHtml incorrectly converts unicode characters above U+00FFFF into 2 characters
Date Sun, 01 Mar 2009 20:47:15 GMT

     [ https://issues.apache.org/jira/browse/LANG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Henri Yandell updated LANG-480:
-------------------------------

    Description: 
Characters that are represented as a 2 characters internaly by java are incorrectly converted
by the function. The following test displays the problem quite nicely:

import org.apache.commons.lang.*;

public class J2 {
    public static void main(String[] args) throws Exception {
        // this is the utf8 representation of the character:
        // COUNTING ROD UNIT DIGIT THREE
        // in unicode
        // codepoint: U+1D362
        byte[] data = new byte[] { (byte)0xF0, (byte)0x9D, (byte)0x8D, (byte)0xA2 };

        //output is: &amp;#55348;&amp;#57186;
        // should be: &amp;#119650;
        System.out.println("'" + StringEscapeUtils.escapeHtml(new String(data, "UTF8")) +
"'");
    }
}

Should be very quick to fix, feel free to drop me an email if you want a patch.

  was:
Characters that are represented as a 2 characters internaly by java are incorrectly converted
by the function. The following test displays the problem quite nicely:

import org.apache.commons.lang.*;

public class J2 {
    public static void main(String[] args) throws Exception {
        // this is the utf8 representation of the character:
        // COUNTING ROD UNIT DIGIT THREE
        // in unicode
        // codepoint: U+1D362
        byte[] data = new byte[] { (byte)0xF0, (byte)0x9D, (byte)0x8D, (byte)0xA2 };

        //output is: &#55348;&#57186;
        // should be: &#119650;
        System.out.println("'" + StringEscapeUtils.escapeHtml(new String(data, "UTF8")) +
"'");
    }
}

Should be very quick to fix, feel free to drop me an email if you want a patch.


> StringEscapeUtils.escapeHtml incorrectly converts unicode characters above U+00FFFF into
2 characters
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LANG-480
>                 URL: https://issues.apache.org/jira/browse/LANG-480
>             Project: Commons Lang
>          Issue Type: Bug
>    Affects Versions: 2.4
>         Environment: doesn't matter
>            Reporter: Alexander Kjäll
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: lang-480.patch
>
>
> Characters that are represented as a 2 characters internaly by java are incorrectly converted
by the function. The following test displays the problem quite nicely:
> import org.apache.commons.lang.*;
> public class J2 {
>     public static void main(String[] args) throws Exception {
>         // this is the utf8 representation of the character:
>         // COUNTING ROD UNIT DIGIT THREE
>         // in unicode
>         // codepoint: U+1D362
>         byte[] data = new byte[] { (byte)0xF0, (byte)0x9D, (byte)0x8D, (byte)0xA2 };
>         //output is: &amp;#55348;&amp;#57186;
>         // should be: &amp;#119650;
>         System.out.println("'" + StringEscapeUtils.escapeHtml(new String(data, "UTF8"))
+ "'");
>     }
> }
> Should be very quick to fix, feel free to drop me an email if you want a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message