commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <>
Subject [jira] [Work logged] (LANG-1606) StringUtils.countMatches returns incorrect value while handling intersecting substrings
Date Tue, 01 Sep 2020 09:30:00 GMT


ASF GitHub Bot logged work on LANG-1606:

                Author: ASF GitHub Bot
            Created on: 01/Sep/20 09:29
            Start Date: 01/Sep/20 09:29
    Worklog Time Spent: 10m 
      Work Description: coveralls commented on pull request #615:

   [![Coverage Status](](
   Coverage remained the same at 94.705% when pulling **14e88e0a96f7b63302574f7a9a2d2996766c6aba
on GaliFFun:LANG-1606-countMatches** into **649dedbbe8b6ab61fb3c4792c86b7e0af7ec4a73 on apache:master**.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

Issue Time Tracking

    Worklog Id:     (was: 477097)
    Time Spent: 20m  (was: 10m)

> StringUtils.countMatches returns incorrect value while handling intersecting substrings
> ---------------------------------------------------------------------------------------
>                 Key: LANG-1606
>                 URL:
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.11
>            Reporter: Rustem Galiev
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
> Steps to reproduce:
> 1. Call the method like that:
> {code:java}
> int count = StringUtils.countMatches("abaabaababaab", "aba");
> {code}
> Actual result: the value of count variable equals 3
>  Expected result: the value of count variable equals 4
> The substrings are highlighted in red:
>  {color:#ff0000}aba{color}abaababaab
>  aba{color:#ff0000}aba{color}ababaab
>  abaaba{color:#ff0000}aba{color}baab
>  abaabaab{color:#ff0000}aba{color}ab
> Method returns incorrect value because of this code:
> {code:java}
> while ((idx = CharSequenceUtils.indexOf(str, sub, idx)) != INDEX_NOT_FOUND) {
>     count++;
>     idx += sub.length();
> }
> {code}
> This looks like a greedy algorithm - but increasing the idx variable by the length of
substring could lead to the problems like in example:
> Let's say that idx = 6, so we try to find a substring in the highlighted suffix:
>  abaaba{color:#ff0000}ababaab{color}
> We found the substring, so idx now becomes idx + 3 = 9. So now this suffix will be used
for searching substring in it:
>  abaabaaba{color:#ff0000}baab{color}
>  But because of increasing the value of idx by 3 we won't find the substring (abaabaab{color:#ff0000}aba{color}ab)
which intersects with the already found substring on the last step.
> Basically, this method will work incorrectly with any substrings that intersect with
each other.
> There is also a unit test with incorrect expected value:
> {code:java}
> assertEquals(4,
>      StringUtils.countMatches("oooooooooooo", "ooo"));
> {code}
> If this behavior (counting substrings that do not intersect) is intended, please update
the JavaDoc to reflect it. Right now it looks like that:
> {code:java}
> Counts how many times the substring appears in the larger string.
> {code}
> Link for the PR:

This message was sent by Atlassian Jira

View raw message