Return-Path: Originally the text package was added in Commons Lang 2.2. However, its
new home is here. It provides, amongst other
classes, a replacement for Text has a series of String utilities. The first is StringUtils,
- oodles and oodles of functions which tweak, transform, squeeze and
- cuddle java.lang.Strings. In addition to StringUtils, there are a
- series of other String manipulating classes; RandomStringUtils,
- StringEscapeUtils and Tokenizer. RandomStringUtils speaks for itself.
+ Beyond the text utilities ported over from lang, we have also included various
+ string similarity and distance functions. Lastly, there are also utilities for
+ addressing differences between bodies of text for the sake of viewing these
+ differences.
+ From Lang 3.5, we have moved into Text StringEscapeUtils and StrTokenizer.
It's provides ways in which to generate pieces of text, such as might
be used for default passwords. StringEscapeUtils contains methods to
- escape and unescape Java, JavaScript, HTML, XML and SQL. Tokenizer is
+ escape and unescape Java, JavaScript, HTML, XML and SQL. It is worth noting that
+ the package These are ideal classes to start using if you're looking to get into
- Text. StringUtils' capitalize, substringBetween/Before/After, split
- and join are good methods to begin with. If you use
- java.sql.Statements a lot, StringEscapeUtils.escapeSql might be of
- interest.
- In addition to these classes, WordUtils is another String
- manipulator. It works on Strings at the word level, for example
- WordUtils.capitalize will capitalize every word in a piece of text.
- WordUtils also contains methods to wrap text.
- In addition to dealing with Strings, it's also important to deal with
- chars and Characters. CharUtils exists for this purpose, while
- CharSetUtils exists for set-manipulation of Strings. Be careful,
- although CharSetUtils takes an argument of type String, it is only as
- a set of characters. For example, The CharRange and CharSet are both used internally by CharSetUtils, and
- will probaby rarely be used.
- SystemUtils is a simple little class which makes it easy to find out
- information about which platform you are on. For some, this is a
- necessary evil. It was never something I expected to use myself until
- I was trying to ensure that Commons Text itself compiled under JDK
- 1.2. Having pushed out a few JDK 1.3 bits that had slipped in ( The CharEncoding class is also used to interact with the Java
- environment and may be used to see which character encodings are
- supported in a particular environment.
- Serialization doesn't have to be that hard! A simple util class can
- take away the pain, plus it provides a method to clone an object by
- unserializing and reserializing, an old Java trick.
+
+ The list of "edit distances" that we currently support follow:
+ StringBuffer
named
@@ -84,168 +69,63 @@ limitations under the License.
or future standard Java classes.
org.apache.commons.text.beta.translate
holds the
+ functionality underpinning the StringEscapeUtils, with mappings and translations
+ between such mappings for the sake of doing String escaping. StrTokenizer is
an improved alternative to java.util.StringTokenizer.
- CharSetUtils.delete("testtest", "tr")
-
will remove all t's and all r's from the String, not just the
- String "tr".
+ similarity
packages contains various different mechanisms of
+ calculating "similarity scores" as well as "edit distances between Strings. Note,
+ the difference between a "similarity score" and a "distance function" is that
+ a distance functions meets the following qualifications:
+
+
+ whereas a "similarity score" need not satisfy all such properties. Though, it
+ is fairly easy to "normalize" a similarity score to manufacture an "edit distance."
d(x,y) >= 0
, non-negativity or separation axiomd(x,y) == 0
, if and only if, x == y
d(x,y) == d(y,x)
, symmetry, andd(x,z) <= d(x,y) + d(y,z)
, the triangle inequality
- Collections.EMPTY_MAP
-
is a classic offender), I then found that one of the Unit
- Tests was dying mysteriously under JDK 1.2, but ran fine under JDK
- 1.3. There was no obvious solution and I needed to move onwards, so
- the simple solution was to wrap that particular test in a
- if(SystemUtils.isJavaVersionAtLeast(1.3f)) {
, make a note and
- move on.
-
+
+ and the list of "similarity scores" that we support follows:
+
+
Would you believe it, ObjectUtils contains handy functions for - Objects, mainly null-safe implementations of the methods on - java.lang.Object. -
-ClassUtils is largely a set of helper methods for reflection. Of
- special note are the comparators hidden away in ClassUtils, useful for
- sorting Class and Package objects by name; however they merely sort
- alphabetically and don't understand the common habit of sorting
- java
-
and javax
first.
-
Next up, ArrayUtils. This is a big one with many methods and many - overloads of these methods so it is probably worth an in depth look - here. Before we begin, assume that every method mentioned is - overloaded for all the primitives and for Object. Also, the short-hand - 'xxx' implies a generic primitive type, but usually also includes - Object. -
-add(xxx[], xxx)
- will add a primitive type to an array, resizing the array as you'd
- expect. Object is also supported.
- clone(xxx[])
- clones a primitive or Object array.
- contains(xxx[], xxx)
- searches for a primitive or Object in a primitive or Object array.
- getLength(Object)
- returns the length of any array or an IllegalArgumentException if
- the parameter is not an array. hashCode(Object)
,
- equals(Object, Object)
,
- toString(Object)
- indexOf(xxx[], xxx)
- and indexOf(xxx[], xxx, int)
are copies of the classic
- String methods, but this time for primitive/Object arrays. In
- addition, a lastIndexOf set of methods exists.
- isEmpty(xxx[])
- lets you know if an array is zero-sized or null.
- isSameLength(xxx[], xxx[])
- returns true if the arrays are the same length.
-
- remove(xxx[], int)
, while the second type remove the first
- value from the array, remove(xxx[], xxx)
.
- reverse(xxx[])
method turns
- an array around.
- subarray(xxx[], int, int)
method splices an array
- out of a larger array.
-
- toObject(xxx[])
-
and toPrimitive(Xxx[])
methods.
- Lastly, ArrayUtils.toMap(Object[])
is worthy of special
- note. It is not a heavily overloaded method for working with arrays,
- but a simple way to create Maps from literals.
-
Our final util class is BooleanUtils. It contains various Boolean
- acting methods, probably of most interest is the
- BooleanUtils.toBoolean(String)
-
method which turns various positive/negative Strings into a
- Boolean object, and not just true/false as with Boolean.valueOf.
+ name="Text diff'ing">
+
The org.apache.commons.text.beta.diff
package contains code for
+ doing diff between strings. The initial implementation of the Myers algorithm was adapted from the
+ commons-collections sequence package.