commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Kjäll (JIRA) <j...@apache.org>
Subject [jira] Commented: (LANG-480) StringEscapeUtils.escapeHtml incorrectly converts unicode characters above U+00FFFF into 2 characters
Date Thu, 22 Jan 2009 14:35:59 GMT

    [ https://issues.apache.org/jira/browse/LANG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666155#action_12666155
] 

Alexander Kjäll commented on LANG-480:
--------------------------------------

Just my 2 cents, I don't need a release that fixes this bug, i stumbled on it by chance and
wrote a patch so that the next person that have the same problem that i do won't have to dig
through the library in order to understand what's going on.

I'm mainly interested in fixing this because i don't like buggy software, but i totally agree
that building in reflection stuff leads to more problems than it solves in the long run.

My opinion on how to fix this is either push for the JDK 1.5 dependency, or write some code
that parses the format the strings are stored in memory. The latter might sound complicated
but i think it's quite straight forward.

> StringEscapeUtils.escapeHtml incorrectly converts unicode characters above U+00FFFF into
2 characters
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LANG-480
>                 URL: https://issues.apache.org/jira/browse/LANG-480
>             Project: Commons Lang
>          Issue Type: Bug
>    Affects Versions: 2.4
>         Environment: doesn't matter
>            Reporter: Alexander Kjäll
>            Priority: Minor
>         Attachments: lang-480.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Characters that are represented as a 2 characters internaly by java are incorrectly converted
by the function. The following test displays the problem quite nicely:
> import org.apache.commons.lang.*;
> public class J2 {
>     public static void main(String[] args) throws Exception {
>         // this is the utf8 representation of the character:
>         // COUNTING ROD UNIT DIGIT THREE
>         // in unicode
>         // codepoint: U+1D362
>         byte[] data = new byte[] { (byte)0xF0, (byte)0x9D, (byte)0x8D, (byte)0xA2 };
>         //output is: &#55348;&#57186;
>         // should be: &#119650;
>         System.out.println("'" + StringEscapeUtils.escapeHtml(new String(data, "UTF8"))
+ "'");
>     }
> }
> Should be very quick to fix, feel free to drop me an email if you want a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message