commons-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chtom...@apache.org
Subject [1/6] [text] TEXT-62: Fixing typographical errors, starting userguide
Date Mon, 30 Jan 2017 12:39:15 GMT
Repository: commons-text
Updated Branches:
  refs/heads/master 212288b08 -> 9fa1158ee


TEXT-62: Fixing typographical errors, starting userguide


Project: http://git-wip-us.apache.org/repos/asf/commons-text/repo
Commit: http://git-wip-us.apache.org/repos/asf/commons-text/commit/82542979
Tree: http://git-wip-us.apache.org/repos/asf/commons-text/tree/82542979
Diff: http://git-wip-us.apache.org/repos/asf/commons-text/diff/82542979

Branch: refs/heads/master
Commit: 825429791a58976b6d6b7d03dec441d79d1409ae
Parents: 212288b
Author: Rob Tompkins <chtompki@gmail.com>
Authored: Fri Jan 27 21:24:09 2017 -0500
Committer: Rob Tompkins <chtompki@gmail.com>
Committed: Fri Jan 27 21:24:09 2017 -0500

----------------------------------------------------------------------
 RELEASE-NOTES.txt                               |  18 +-
 pom.xml                                         |   4 +-
 src/assembly/src.xml                            |   6 +
 src/changes/changes.xml                         |   1 +
 .../text/beta/similarity/package-info.java      |   1 +
 src/site/xdoc/userguide.xml                     | 331 ++++++++++++++++++-
 6 files changed, 347 insertions(+), 14 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/commons-text/blob/82542979/RELEASE-NOTES.txt
----------------------------------------------------------------------
diff --git a/RELEASE-NOTES.txt b/RELEASE-NOTES.txt
index 73c3a17..c2a44fb 100644
--- a/RELEASE-NOTES.txt
+++ b/RELEASE-NOTES.txt
@@ -13,6 +13,18 @@ Java environment.
 
 Apache Commons Text is a library focused on algorithms working on strings.
 
+A NOTE ON THE HISTORY OF THE CODE
+=================================
+
+The codebase began in the fall of 2014 as a location for housing algorithms for
+operating on Strings that seemed to have a more complex nature than those which
+would be considered a needed extension to java.lang. Thus, a new component,
+different from Apache Commons Lang was warranted. As the project evolved, it was
+noticed that Commons Lang had considerable more text manipulation tools than
+the average Java application developer would need or even want. So, we have
+decided to move the more esoteric String processing algorithms out of Commons
+Lang into Commons Text.
+
 JAVA 9 SUPPORT
 ==============
 
@@ -46,6 +58,10 @@ o TEXT-9:    Incorporate String algorithms from Commons Lang Thanks to
britter.
 FIXED BUGS
 ==========
 
+Note. We recognize the curoisity of a new component having "fixed bugs," but a
+considerable number of files were migrated over from Commons Lang, some of which
+needed fixes.
+
 o TEXT-60:   Upgrading Jacoco for Java 9-ea compatibility. Thanks to Lee Adcock.
 o TEXT-52:   Possible attacks through StringEscapeUtils.escapeEcmaScrip better
              javadoc
@@ -93,4 +109,4 @@ Apache Commons Text website:
 http://commons.apache.org/text/
 
 Have fun!
--Apachje Commons Text team
\ No newline at end of file
+-Apache Commons Text team
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/commons-text/blob/82542979/pom.xml
----------------------------------------------------------------------
diff --git a/pom.xml b/pom.xml
index 55a4a95..ddeb5c0 100644
--- a/pom.xml
+++ b/pom.xml
@@ -124,8 +124,8 @@
     <maven.compiler.target>1.7</maven.compiler.target>
 
     <commons.componentid>text</commons.componentid>
-    <!-- Current 3.x release series -->
-    <commons.release.version>1.0-beta-1</commons.release.version>
+
+    <commons.release.version>1.0</commons.release.version>
     <commons.release.desc>(Java 7+)</commons.release.desc>
 
     <commons.jira.id>TEXT</commons.jira.id>

http://git-wip-us.apache.org/repos/asf/commons-text/blob/82542979/src/assembly/src.xml
----------------------------------------------------------------------
diff --git a/src/assembly/src.xml b/src/assembly/src.xml
index ab1da0d..48a8c7c 100644
--- a/src/assembly/src.xml
+++ b/src/assembly/src.xml
@@ -24,10 +24,16 @@
     <fileSets>
         <fileSet>
             <includes>
+                <include>checkstyle.xml</include>
+                <include>checkstyle-supressions.xml</include>
+                <include>CONTRIBUTING.md</include>
+                <include>fb-excludes.xml</include>
                 <include>LICENSE.txt</include>
+                <include>license-header.txt</include>
                 <include>NOTICE.txt</include>
                 <include>pom.xml</include>
                 <include>PROPOSAL.html</include>
+                <include>README.md</include>
                 <include>RELEASE-NOTES.txt</include>
             </includes>
         </fileSet>

http://git-wip-us.apache.org/repos/asf/commons-text/blob/82542979/src/changes/changes.xml
----------------------------------------------------------------------
diff --git a/src/changes/changes.xml b/src/changes/changes.xml
index 0f8ca6f..116a93d 100644
--- a/src/changes/changes.xml
+++ b/src/changes/changes.xml
@@ -46,6 +46,7 @@ The <action> type attribute can be add,update,fix,remove.
   <body>
 
   <release version="1.0-beta-1" date="2017-01-25" description="First release (beta) of
Commons Text">
+    <action issue="TEXT-62" type="fix" dev="chtompki">Incorporate suggestions from
RC2 into 1.0 release</action>
     <action issue="TEXT-61" type="update" dev="chtompki" due-to="Lee Adcock">Naming
packages org.apache.commons.text.beta</action>
     <action issue="TEXT-60" type="fix" dev="chtompki" due-to="Lee Adcock">Upgrading
Jacoco for Java 9-ea compatibility.</action>
     <action issue="TEXT-58" type="update" dev="chtompki">Refactor EntityArrays to have
unmodifiableMaps in leu of String[][]</action>

http://git-wip-us.apache.org/repos/asf/commons-text/blob/82542979/src/main/java/org/apache/commons/text/beta/similarity/package-info.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/commons/text/beta/similarity/package-info.java b/src/main/java/org/apache/commons/text/beta/similarity/package-info.java
index 914e45a..957901c 100644
--- a/src/main/java/org/apache/commons/text/beta/similarity/package-info.java
+++ b/src/main/java/org/apache/commons/text/beta/similarity/package-info.java
@@ -30,6 +30,7 @@
  * <li>{@link org.apache.commons.text.beta.similarity.HammingDistance Hamming Distance}</li>
  * <li>{@link org.apache.commons.text.beta.similarity.JaroWinklerDistance Jaro-Winkler
Distance}</li>
  * <li>{@link org.apache.commons.text.beta.similarity.LevenshteinDistance Levenshtein
Distance}</li>
+ * <li>{@link org.apache.commons.text.beta.similarity.LongestCommonSubsequenceDistance
Longest Commons Subsequence Distance}</li>
  * </ul>
  *
  * <p>The {@link org.apache.commons.text.beta.similarity.CosineDistance Cosine Distance}

http://git-wip-us.apache.org/repos/asf/commons-text/blob/82542979/src/site/xdoc/userguide.xml
----------------------------------------------------------------------
diff --git a/src/site/xdoc/userguide.xml b/src/site/xdoc/userguide.xml
index 27e4a7d..1c93b2d 100644
--- a/src/site/xdoc/userguide.xml
+++ b/src/site/xdoc/userguide.xml
@@ -6,27 +6,336 @@ this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at
-
      http://www.apache.org/licenses/LICENSE-2.0
-
 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 -->
+
 <document>
 
- <properties>
-  <title>Commons Text - User guide</title>
-  <author email="dev@commons.apache.org">Commons Documentation Team</author>
- </properties>
+  <properties>
+    <title>Commons Text - User guide</title>
+    <author email="dev@commons.apache.org">Commons Documentation Team</author>
+  </properties>
+
+  <body>
+    <!-- $Id$ -->
+
+    <section name='User guide for Commons "Text"'>
+      <div align="center">
+        <h1>The Commons <em>Text</em> Package
+        </h1>
+        <h2>Users Guide</h2>
+        <br/>
+        <a href="#Description">[Description]</a>
+        <a href="#text.beta.">[text.beta.*]</a>
+        <a href="#text.beta.diff.">[text.beta.diff.*]</a>
+        <a href="#text.beta.similarity.">[text.beta.similarity.*]</a>
+        <a href="#text.beta.translate.">[text.beta.translate.*]</a>
+        <br/>
+        <br/>
+      </div>
+    </section>
+
+    <section name="Description">
+      <p>The Commons Text library provides additions to the standard JDK's
+        java.lang package. Very generic, very reusable components for everyday
+        use.
+      </p>
+      <p>The text package was added in Commons Lang 2.2. It provides, amongst
+        other classes, a replacement for StringBuffer named <code>
+          StrBuilder</code>, a class for substituting variables within a String
+        named <code>StrSubstitutor</code> and a replacement for StringTokenizer
+        named <code>StrTokenizer</code>. While somewhat ungainly, the <code>
+          Str
+        </code> prefix has been used to ensure we don't clash with any current
+        or future standard Java classes.
+      </p>
+    </section>
+
+    <section name="text.beta.*">
+      <!--
+      AlphabetConverter
+      Builder
+      CharacterPredicate
+      CharacterPredicates
+      CompositeFormat
+      ExtendedMessageFormat
+      FormatFactory
+      FormattableUtils
+      StrLookup
+      StrSubstitutor
+      StrBuilder
+      StrMatcher
+      StrTokenizer
+      StringEscapeUtils
+      -->
+      <p>Originally the text package was added in Commons Lang 2.2. However, its
+        new home is here. It provides, amongst other
+        classes, a replacement for <code>StringBuffer</code> named <code>
+          StrBuilder</code>, a class for substituting variables within a String
+        named <code>StrSubstitutor</code> and a replacement for StringTokenizer
+        named <code>StrTokenizer</code>. While somewhat ungainly, the <code>
+          Str
+        </code> prefix has been used to ensure we don't clash with any current
+        or future standard Java classes.
+      </p>
+
+      <subsection name="String manipulation - StringEscapeUtils">
+        <p>Text has a series of String utilities. The first is StringUtils,
+          oodles and oodles of functions which tweak, transform, squeeze and
+          cuddle java.lang.Strings. In addition to StringUtils, there are a
+          series of other String manipulating classes; RandomStringUtils,
+          StringEscapeUtils and Tokenizer. RandomStringUtils speaks for itself.
+          It's provides ways in which to generate pieces of text, such as might
+          be used for default passwords. StringEscapeUtils contains methods to
+          escape and unescape Java, JavaScript, HTML, XML and SQL. Tokenizer is
+          an improved alternative to java.util.StringTokenizer.
+        </p>
+        <p>These are ideal classes to start using if you're looking to get into
+          Text. StringUtils' capitalize, substringBetween/Before/After, split
+          and join are good methods to begin with. If you use
+          java.sql.Statements a lot, StringEscapeUtils.escapeSql might be of
+          interest.
+        </p>
+        <p>In addition to these classes, WordUtils is another String
+          manipulator. It works on Strings at the word level, for example
+          WordUtils.capitalize will capitalize every word in a piece of text.
+          WordUtils also contains methods to wrap text.
+        </p>
+      </subsection>
+
+      <subsection
+              name="Character handling - CharSetUtils, CharSet, CharRange, CharUtils">
+        <p>In addition to dealing with Strings, it's also important to deal with
+          chars and Characters. CharUtils exists for this purpose, while
+          CharSetUtils exists for set-manipulation of Strings. Be careful,
+          although CharSetUtils takes an argument of type String, it is only as
+          a set of characters. For example, <code>
+            CharSetUtils.delete("testtest", "tr")
+          </code> will remove all t's and all r's from the String, not just the
+          String "tr".
+        </p>
+        <p>CharRange and CharSet are both used internally by CharSetUtils, and
+          will probaby rarely be used.
+        </p>
+      </subsection>
+
+      <subsection name="JVM interaction - SystemUtils, CharEncoding">
+        <p>SystemUtils is a simple little class which makes it easy to find out
+          information about which platform you are on. For some, this is a
+          necessary evil. It was never something I expected to use myself until
+          I was trying to ensure that Commons Text itself compiled under JDK
+          1.2. Having pushed out a few JDK 1.3 bits that had slipped in (<code>
+            Collections.EMPTY_MAP
+          </code> is a classic offender), I then found that one of the Unit
+          Tests was dying mysteriously under JDK 1.2, but ran fine under JDK
+          1.3. There was no obvious solution and I needed to move onwards, so
+          the simple solution was to wrap that particular test in a <code>
+            if(SystemUtils.isJavaVersionAtLeast(1.3f)) {</code>, make a note and
+          move on.
+        </p>
+        <p>The CharEncoding class is also used to interact with the Java
+          environment and may be used to see which character encodings are
+          supported in a particular environment.
+        </p>
+      </subsection>
+
+      <subsection
+              name="Serialization - SerializationUtils, SerializationException">
+        <p>Serialization doesn't have to be that hard! A simple util class can
+          take away the pain, plus it provides a method to clone an object by
+          unserializing and reserializing, an old Java trick.
+        </p>
+      </subsection>
+
+      <subsection
+              name="Assorted functions - ObjectUtils, ClassUtils, ArrayUtils, BooleanUtils">
+        <p>Would you believe it, ObjectUtils contains handy functions for
+          Objects, mainly null-safe implementations of the methods on
+          java.lang.Object.
+        </p>
+        <p>ClassUtils is largely a set of helper methods for reflection. Of
+          special note are the comparators hidden away in ClassUtils, useful for
+          sorting Class and Package objects by name; however they merely sort
+          alphabetically and don't understand the common habit of sorting <code>
+            java
+          </code> and <code>javax</code> first.
+        </p>
+        <p>Next up, ArrayUtils. This is a big one with many methods and many
+          overloads of these methods so it is probably worth an in depth look
+          here. Before we begin, assume that every method mentioned is
+          overloaded for all the primitives and for Object. Also, the short-hand
+          'xxx' implies a generic primitive type, but usually also includes
+          Object.
+        </p>
+        <ul>
+          <li>ArrayUtils provides singleton empty arrays for all the basic
+            types. These will largely be of use in the Collections API with its
+            toArray methods, but also will be of use with methods which want to
+            return an empty array on error.
+          </li>
+          <li>
+            <code>add(xxx[], xxx)</code>
+            will add a primitive type to an array, resizing the array as you'd
+            expect. Object is also supported.
+          </li>
+          <li>
+            <code>clone(xxx[])</code>
+            clones a primitive or Object array.
+          </li>
+          <li>
+            <code>contains(xxx[], xxx)</code>
+            searches for a primitive or Object in a primitive or Object array.
+          </li>
+          <li>
+            <code>getLength(Object)</code>
+            returns the length of any array or an IllegalArgumentException if
+            the parameter is not an array. <code>hashCode(Object)</code>, <code>
+            equals(Object, Object)</code>,
+            <code>toString(Object)</code>
+          </li>
+          <li>
+            <code>indexOf(xxx[], xxx)</code>
+            and <code>indexOf(xxx[], xxx, int)</code> are copies of the classic
+            String methods, but this time for primitive/Object arrays. In
+            addition, a lastIndexOf set of methods exists.
+          </li>
+          <li>
+            <code>isEmpty(xxx[])</code>
+            lets you know if an array is zero-sized or null.
+          </li>
+          <li>
+            <code>isSameLength(xxx[], xxx[])</code>
+            returns true if the arrays are the same length.
+          </li>
+          <li>Along side the add methods, there are also remove methods of two
+            types. The first type remove the value at an index, <code>
+              remove(xxx[], int)</code>, while the second type remove the first
+            value from the array, <code>remove(xxx[], xxx)</code>.
+          </li>
+          <li>Nearing the end now. The <code>reverse(xxx[])</code> method
turns
+            an array around.
+          </li>
+          <li>The <code>subarray(xxx[], int, int)</code> method splices
an array
+            out of a larger array.
+          </li>
+          <li>Primitive to primitive wrapper conversion is handled by the <code>
+            toObject(xxx[])
+          </code> and <code>toPrimitive(Xxx[])</code> methods.
+          </li>
+        </ul>
+        <p>Lastly, <code>ArrayUtils.toMap(Object[])</code> is worthy of
special
+          note. It is not a heavily overloaded method for working with arrays,
+          but a simple way to create Maps from literals.
+        </p>
+        <h5>Using toMap</h5>
+        <source>
+          Map colorMap = MapUtils.toMap(new String[][] {{
+          {"RED", "#FF0000"},
+          {"GREEN", "#00FF00"},
+          {"BLUE", "#0000FF"}
+          });
+        </source>
+
+        <p>Our final util class is BooleanUtils. It contains various Boolean
+          acting methods, probably of most interest is the <code>
+            BooleanUtils.toBoolean(String)
+          </code> method which turns various positive/negative Strings into a
+          Boolean object, and not just true/false as with Boolean.valueOf.
+        </p>
+      </subsection>
+
+
+    </section>
+
+    <section name="text.beta.diff.*">
+      <!--
+      CommandVisitor
+      DeleteCommand
+      EditCommand
+      EditScript
+      InsertCommand
+      KeepCommand
+      ReplacementsFinder
+      ReplacementsHandler
+      StringsComparator
+      -->
+      <p>Provides algorithms for diff between strings.</p>
+      <p>The initial implementation of the Myers algorithm was adapted from the
+        commons-collections sequence package.
+      </p>
+    </section>
+
+    <section name="text.beta.similarity.*">
+      <!--
+      Enum
+      EnumUtils
+      ValuedEnum
+      -->
+      <p>Provides algorithms for string similarity.</p>
+
+      <p>The algorithms that implement the EditDistance interface follow the
+        same
+        simple principle: the more similar (closer) strings are, lower is the
+        distance.
+        For example, the words house and hose are closer than house and
+        trousers.
+      </p>
+
+      <p>The following algorithms are available at the moment:</p>
+
+      <ul>
+        <li>
+          <code>CosineDistance</code>
+        </li>
+        <li>
+          <code>CosineSimilarity</code>
+        </li>
+        <li>
+          <code>FuzzyScore</code>
+        </li>
+        <li>
+          <code>HammingDistance</code>
+        </li>
+        <li>
+          <code>JaroWinklerDistance</code>
+        </li>
+        <li>
+          <code>LevenshteinDistance</code>
+        </li>
+        <li>
+          <code>LongestCommonSubsequenceDistance</code>
+        </li>
+      </ul>
 
- <body>
+      <p>The <code>CosineDistance</code> utilises a
+        <code>RegexTokenizer</code>
+        regular expression tokenizer (\w+). And the <code>
+          LevenshteinDistance</code>'s
+        behaviour can be changed to take into consideration a maximum
+        throughput.
+      </p>
+    </section>
 
-  <section name='User guide for Commons "Text"'>
-    TODO
-  </section>
+    <section name="text.translate.*">
+      <!--
+      ExceptionUtils
+      Nestable
+      NestableDelegate
+      NestableError
+      NestableException
+      NestableRuntimeException
+      -->
+      <p>An API for creating text translation routines from a set of smaller
+        building blocks. Initially created to make it possible for the user to
+        customize the rules in the StringEscapeUtils class.
+      </p>
+      <p>These classes are immutable, and therefore thread-safe.</p>
+    </section>
 
-</body>
+  </body>
 </document>


Mime
View raw message