commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Work logged] (TEXT-156) Fix the RegexTokenizer to use a static Pattern
Date Thu, 07 Mar 2019 23:28:00 GMT

     [ https://issues.apache.org/jira/browse/TEXT-156?focusedWorklogId=209836&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-209836
]

ASF GitHub Bot logged work on TEXT-156:
---------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Mar/19 23:27
            Start Date: 07/Mar/19 23:27
    Worklog Time Spent: 10m 
      Work Description: kinow commented on pull request #110: TEXT-156: Fix the RegexTokenizer
to use a static Pattern.
URL: https://github.com/apache/commons-text/pull/110#discussion_r263613213
 
 

 ##########
 File path: src/main/java/org/apache/commons/text/similarity/RegexTokenizer.java
 ##########
 @@ -26,12 +26,14 @@
 
 /**
  * A simple word tokenizer that utilizes regex to find words. It applies a regex
- * {@code}(\w)+{@code} over the input text to extract words from a given character
+ * {@code (\w)+} over the input text to extract words from a given character
  * sequence.
  *
  * @since 1.0
  */
 class RegexTokenizer implements Tokenizer<CharSequence> {
+    /** The whitespace pattern. */
+    private static final Pattern pattern = Pattern.compile("(\\w)+");
 
 Review comment:
   s/pattern/PATTERN to fix checkstyle?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 209836)
    Time Spent: 0.5h  (was: 20m)

> Fix the RegexTokenizer to use a static Pattern
> ----------------------------------------------
>
>                 Key: TEXT-156
>                 URL: https://issues.apache.org/jira/browse/TEXT-156
>             Project: Commons Text
>          Issue Type: Improvement
>    Affects Versions: 1.6
>            Reporter: Alex D Herbert
>            Priority: Trivial
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Use of a static pattern avoids the compile step for each invocation. The matcher created
from the pattern contains all state so the pattern can be used across threads.
> Also:
>  * Remove the use of CharSequence.toString() to pass to the
> matcher(CharSequence) method. There is no need to create a String.
>  * Fix the javadoc header @code tag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message