commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Benson (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (LANG-708) StringEscapeUtils.escapeEcmaScript from lang3 cuts off long unicode string
Date Fri, 15 Jul 2011 03:33:00 GMT

     [ https://issues.apache.org/jira/browse/LANG-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matt Benson resolved LANG-708.
------------------------------

    Resolution: Duplicate

With LANG-720 fixed, lang3 trunk no longer cuts off the end of the string.

> StringEscapeUtils.escapeEcmaScript from lang3 cuts off long unicode string
> --------------------------------------------------------------------------
>
>                 Key: LANG-708
>                 URL: https://issues.apache.org/jira/browse/LANG-708
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.translate.*
>            Reporter: anton
>             Fix For: 3.x
>
>         Attachments: Test.java, input.txt
>
>
> Hello, I have a really big JSON string (generated from db) with unicode chars and I need
to pass it though StringEscapeUtils.escapeEcmaScript(value). This string is generated and
in most cases it works ok, but I have met a specific string (attached below) which is not
correctly converted - few symbols (about 10) at the end of the string are cut-off (and actually
they are not already unicode chars).
> the original string ends with:
>  "geonameId":6544329,"valueCode":""}]
> and the produced string ends with:
>  \"geonameId\":6544329,\"value
> So Code":""}] part is missing and this does not allow to parse the result as JSON on
the client side.
> I have tried to debug a bit with StringEscapeUtils.escapeEcmaScript source code and is
seems that the problem is somewhere around here:
> CharSequenceTranslator.translate(...){
> ...
>         int sz = Character.codePointCount(input, 0, input.length());
>         for (int i = 0; i < sz; i++) {
>             // consumed is the number of codepoints consumed
>             int consumed = translate(input, i, out);
>             if(consumed == 0) { 
>                 out.write( Character.toChars( Character.codePointAt(input, i) ) );
>             }
> ...
> }
> If I put breakpoint condition to stop in the loop when i==(sz-5), I can see that the
last chars of "valueCode" literal are being added to the end of "out" stream, but the counter
condition ends too early to reach the end of original input String.
> So, it seems that somehow with the provided string either the sz value is calculated
incorrectly or the processing loop did wrong counter adjustmes at some point.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message