commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <>
Subject [jira] [Work logged] (LANG-1606) StringUtils.countMatches returns incorrect value while handling intersecting substrings
Date Tue, 01 Sep 2020 08:40:00 GMT


ASF GitHub Bot logged work on LANG-1606:

                Author: ASF GitHub Bot
            Created on: 01/Sep/20 08:39
            Start Date: 01/Sep/20 08:39
    Worklog Time Spent: 10m 
      Work Description: GaliFFun opened a new pull request #615:


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

Issue Time Tracking

            Worklog Id:     (was: 477072)
    Remaining Estimate: 0h
            Time Spent: 10m

> StringUtils.countMatches returns incorrect value while handling intersecting substrings
> ---------------------------------------------------------------------------------------
>                 Key: LANG-1606
>                 URL:
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.11
>            Reporter: Rustem Galiev
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
> Steps to reproduce:
> 1. Call the method like that:
> {code:java}
> int count = StringUtils.countMatches("abaabaababaab", "aba");
> {code}
> Actual result: the value of count variable equals 3
>  Expected result: the value of count variable equals 4
> The substrings are highlighted in red:
>  {color:#ff0000}aba{color}abaababaab
>  aba{color:#ff0000}aba{color}ababaab
>  abaaba{color:#ff0000}aba{color}baab
>  abaabaab{color:#ff0000}aba{color}ab
> Method returns incorrect value because of this code:
> {code:java}
> while ((idx = CharSequenceUtils.indexOf(str, sub, idx)) != INDEX_NOT_FOUND) {
>     count++;
>     idx += sub.length();
> }
> {code}
> This looks like a greedy algorithm - but increasing the idx variable by the length of
substring could lead to the problems like in example:
> Let's say that idx = 6, so we try to find a substring in the highlighted suffix:
>  abaaba{color:#ff0000}ababaab{color}
> We found the substring, so idx now becomes idx + 3 = 9. So now this suffix will be used
for searching substring in it:
>  abaabaaba{color:#ff0000}baab{color}
>  But because of increasing the value of idx by 3 we won't find the substring (abaabaab{color:#ff0000}aba{color}ab)
which intersects with the already found substring on the last step.
> Basically, this method will work incorrectly with any substrings that intersect with
each other.
> There is also a unit test with incorrect expected value:
> {code:java}
> assertEquals(4,
>      StringUtils.countMatches("oooooooooooo", "ooo"));
> {code}
> If this behavior (counting substrings that do not intersect) is intended, please update
the JavaDoc to reflect it. Right now it looks like that:
> {code:java}
> Counts how many times the substring appears in the larger string.
> {code}

This message was sent by Atlassian Jira

View raw message