commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEXT-98) Remove isDelimiter() and use HashSets for delimiter check
Date Mon, 24 Jul 2017 17:36:00 GMT

    [ https://issues.apache.org/jira/browse/TEXT-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098884#comment-16098884
] 

ASF GitHub Bot commented on TEXT-98:
------------------------------------

Github user ameyjadiye commented on a diff in the pull request:

    https://github.com/apache/commons-text/pull/57#discussion_r129102692
  
    --- Diff: src/main/java/org/apache/commons/text/WordUtils.java ---
    @@ -747,45 +750,29 @@ public static boolean containsAllWords(final CharSequence word,
final CharSequen
             return true;
         }
     
    -    //-----------------------------------------------------------------------
    +    // -----------------------------------------------------------------------
         /**
    -     * Is the character a delimiter.
    +     * <p>
    +     * Converts an array of delimiters to a hash set of code points. Code point of space(32)
is added as the default
    +     * value if delimiters is null. The generated hash set provides O(1) lookup time.
    +     * </p>
          *
    -     * @param ch  the character to check
    -     * @param delimiters  the delimiters
    -     * @return true if it is a delimiter
    +     * @param delimiters set of characters to determine capitalization, null means whitespace
    +     * @return Set<Integer>
          */
    -    public static boolean isDelimiter(final char ch, final char[] delimiters) {
    -        if (delimiters == null) {
    -            return Character.isWhitespace(ch);
    -        }
    -        for (final char delimiter : delimiters) {
    -            if (ch == delimiter) {
    -                return true;
    +    private static Set<Integer> generateDelimiterSet(final char[] delimiters) {
    +        Set<Integer> delimiterHashSet = new HashSet<>();
    +        if (delimiters == null || delimiters.length == 0) {
    +            if (delimiters == null) {
    +                delimiterHashSet.add(Character.codePointAt(new char[] {' '}, 0));
                 }
    +            return delimiterHashSet;
             }
    -        return false;
    -    }
     
    -  //-----------------------------------------------------------------------
    -    /**
    -     * Is the codePoint a delimiter.
    -     *
    -     * @param codePoint the codePint to check
    -     * @param delimiters  the delimiters
    -     * @return true if it is a delimiter
    -     */
    -    public static boolean isDelimiter(final int codePoint, final char[] delimiters) {
    --- End diff --
    
    Rather removing we should keep this method.


> Remove isDelimiter() and use HashSets for delimiter check
> ---------------------------------------------------------
>
>                 Key: TEXT-98
>                 URL: https://issues.apache.org/jira/browse/TEXT-98
>             Project: Commons Text
>          Issue Type: Improvement
>    Affects Versions: 1.1
>            Reporter: Arun Vinud 
>            Priority: Minor
>             Fix For: 1.2
>
>
> The current implementation of *capitalize*, *uncapitalize* and *initials* in *WordUtils*
calls *isDelimiter* for every character and/or codepoint and isDelimiter loops through the
array of delimiters to check for the  occurrence. This is a bit inefficient and results in
O(nk) complexity and it can be reduced to O( n )[if n>k] or O( k ) [if k>n].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message