commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dominik Strecker (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LANG-1343) StringUtils#abbreviate breaks up surrogate pairs
Date Thu, 29 Jun 2017 09:45:00 GMT
Dominik Strecker created LANG-1343:
--------------------------------------

             Summary: StringUtils#abbreviate breaks up surrogate pairs
                 Key: LANG-1343
                 URL: https://issues.apache.org/jira/browse/LANG-1343
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.*
    Affects Versions: 3.6
            Reporter: Dominik Strecker
            Priority: Minor


If the last char in the remaining substring is the first char of a surrogate pair, the resulting
string has an illegal surrogate pair with the second char of the surrogate pair being the
first char of the ellipsis.


{code:java}
StringUtils.abbreviate("\uD83D\uDCA9\uD83D\uDCA9\uD83D\uDCA9", 4); // returns "\uD83D..."
{code}

In my case this breaks further along when the string is transformed to UTF-8 for a SOAP request.

Should this at least be mentioned in the Javadoc?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message