commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HiuFung Kwok (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LANG-1406) StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase
Date Wed, 08 Aug 2018 12:05:00 GMT

    [ https://issues.apache.org/jira/browse/LANG-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573108#comment-16573108
] 

HiuFung Kwok edited comment on LANG-1406 at 8/8/18 12:04 PM:
-------------------------------------------------------------

Hi all,

After a bit of research, it seem to be a known issue when unicode is contained on a String
object[ref|https://www.quora.com/Is-Javas-toLowercase-string-method-reliable-for-Unicode], String.toLowerCase()
would produce a incorrect result.

In this case "\u0130" would become a String object with three char which are [ i,  ̇,
x] instead of [ İ, x].

So by given a incorrect result from .toLowCase() method, StringUtils.replaceIgnoreCase end
attempt to access the segment of string which is not exist which is 3 in this case while str.length()
is 2.

The fixture I come up with is replacing the .toLowcase() to .toUpperCase() in order to avoid
the mis-interprettion on .toLowerCase while performing case-insensitive comparisons.

Fixture: [https://github.com/HiuKwok/commons-lang/commit/e0f6c7802b5e721019a602bf30b31c79dbf6d233]

Testcase: https://github.com/HiuKwok/commons-lang/commit/590f90889bf61a5570bd98b78e73410a07d7410b

 

 


was (Author: hiukwok):
Hi all,

After a bit of research, it seem to be a known issue when unicode is contained on a String
object([ref|[https://www.quora.com/Is-Javas-toLowercase-string-method-reliable-for-Unicode])], String.toLowerCase()
would produce a incorrect result.

In this case "\u0130" would become a String object with three char which are [ i,  ̇,
x] instead of [ İ, x].

So by given a incorrect result from .toLowCase() method, StringUtils.replaceIgnoreCase end
attempt to access the segment of string which is not exist which is 3 in this case while str.length()
is 2.

The fixture I come up with is replacing the .toLowcase() to .toUpperCase() in order to avoid
the mis-interprettion on .toLowerCase while performing case-insensitive comparisons.

Fixture: [https://github.com/HiuKwok/commons-lang/commit/e0f6c7802b5e721019a602bf30b31c79dbf6d233]

Testcase: https://github.com/HiuKwok/commons-lang/commit/590f90889bf61a5570bd98b78e73410a07d7410b

 

 

> StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase
> ----------------------------------------------------------------
>
>                 Key: LANG-1406
>                 URL: https://issues.apache.org/jira/browse/LANG-1406
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>            Reporter: Michael Ryan
>            Priority: Major
>
> STEPS TO REPRODUCE:
> {code}
> StringUtils.replaceIgnoreCase("\u0130x", "x", "")
> {code}
> EXPECTED: "\u0130" is returned.
> ACTUAL: StringIndexOutOfBoundsException
> This happens because the replace method is assuming that text.length() == text.toLowerCase().length(),
which is not true for certain characters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message