commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LANG-1300) Clarify or improve behaviour of int-based methods in StringUtils
Date Fri, 10 Mar 2017 07:09:04 GMT

    [ https://issues.apache.org/jira/browse/LANG-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15904594#comment-15904594
] 

ASF GitHub Bot commented on LANG-1300:
--------------------------------------

Github user dmjones500 commented on the issue:

    https://github.com/apache/commons-lang/pull/251
  
    We need to agree on the desired behaviour of the method. Based on the signature, I think
we can adjust it to support supplementary characters without violating the implied contract
of the method. The issue for me is about indexes.
    
    Based on the JavaDoc description, I suggest we return the code unit index. This is because
the input is a `CharSequence` and the JavaDoc states:
    
    > Finds the first index within a `CharSequence`, handling null.
    
    Also, a `CharSequence` only has methods which operate on code unit indexes. So returning
a code unit index would be helpful for further operations on the sequence.
    
    What does everyone else think? We should obviously update the JavaDocs to make it crystal
clear what we decide, either way.


> Clarify or improve behaviour of int-based methods in StringUtils
> ----------------------------------------------------------------
>
>                 Key: LANG-1300
>                 URL: https://issues.apache.org/jira/browse/LANG-1300
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>    Affects Versions: 3.5
>            Reporter: Duncan Jones
>            Priority: Minor
>             Fix For: Discussion
>
>
> The following methods use an {{int}} to represent a search character:
> {code:java}
> boolean contains(final CharSequence seq, final int searchChar)
> int indexOf(final CharSequence seq, final int searchChar)
> int indexOf(final CharSequence seq, final int searchChar, final int startPos)
> int lastIndexOf(final CharSequence seq, final int searchChar)
> int lastIndexOf(final CharSequence seq, final int searchChar, final int startPos)
> {code}
> When I see an {{int}} representing a character, I tend to assume the method can handle
supplementary characters. However, the current behaviour of these methods depends upon whether
the {{CharSequence}} is a {{String}} or not.
> {code:java}
> StringBuilder builder = new StringBuilder();
> builder.appendCodePoint(0x2070E);
> System.out.println(StringUtils.lastIndexOf(builder, 0x2070E)); // -1
> System.out.println(StringUtils.lastIndexOf(builder.toString(), 0x2070E)); // 0
> {code}
> The Javadoc for these methods are ambiguous on this point, stating:
> {quote}
> This method uses {{String.lastIndexOf(int)}} if possible.
> {quote}
> I think we should consider updating the {{CharSequenceUtils}} methods used by this class
to convert all {{CharSequence}} parameters to strings, enabling full code point support. The
docs could be updated to make this crystal clear.
> There is a question of whether this breaks backwards compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message