commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Work logged] (LANG-1606) StringUtils.countMatches returns incorrect value while handling intersecting substrings
Date Tue, 01 Sep 2020 12:13:00 GMT

     [ https://issues.apache.org/jira/browse/LANG-1606?focusedWorklogId=477184&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-477184
]

ASF GitHub Bot logged work on LANG-1606:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Sep/20 12:12
            Start Date: 01/Sep/20 12:12
    Worklog Time Spent: 10m 
      Work Description: sebbASF commented on pull request #615:
URL: https://github.com/apache/commons-lang/pull/615#issuecomment-684807243


   Thanks for the PR, but it is the Javadoc that must be fixed, not the code.
   If necessary, raise a separate issue for the a version of the method that supports counting
overlapping occurrences.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 477184)
    Time Spent: 0.5h  (was: 20m)

> StringUtils.countMatches returns incorrect value while handling intersecting substrings
> ---------------------------------------------------------------------------------------
>
>                 Key: LANG-1606
>                 URL: https://issues.apache.org/jira/browse/LANG-1606
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.11
>            Reporter: Rustem Galiev
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> 1. Call the method like that:
> {code:java}
> int count = StringUtils.countMatches("abaabaababaab", "aba");
> {code}
> Actual result: the value of count variable equals 3
>  Expected result: the value of count variable equals 4
> The substrings are highlighted in red:
>  {color:#ff0000}aba{color}abaababaab
>  aba{color:#ff0000}aba{color}ababaab
>  abaaba{color:#ff0000}aba{color}baab
>  abaabaab{color:#ff0000}aba{color}ab
> Method returns incorrect value because of this code:
> {code:java}
> while ((idx = CharSequenceUtils.indexOf(str, sub, idx)) != INDEX_NOT_FOUND) {
>     count++;
>     idx += sub.length();
> }
> {code}
> This looks like a greedy algorithm - but increasing the idx variable by the length of
substring could lead to the problems like in example:
> Let's say that idx = 6, so we try to find a substring in the highlighted suffix:
>  abaaba{color:#ff0000}ababaab{color}
> We found the substring, so idx now becomes idx + 3 = 9. So now this suffix will be used
for searching substring in it:
>  abaabaaba{color:#ff0000}baab{color}
>  But because of increasing the value of idx by 3 we won't find the substring (abaabaab{color:#ff0000}aba{color}ab)
which intersects with the already found substring on the last step.
> Basically, this method will work incorrectly with any substrings that intersect with
each other.
> There is also a unit test with incorrect expected value:
> {code:java}
> assertEquals(4,
>      StringUtils.countMatches("oooooooooooo", "ooo"));
> {code}
> If this behavior (counting substrings that do not intersect) is intended, please update
the JavaDoc to reflect it. Right now it looks like that:
> {code:java}
> Counts how many times the substring appears in the larger string.
> {code}
> Link for the PR: https://github.com/apache/commons-lang/pull/615



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message