File path: src/main/java/org/apache/commons/text/similarity/IntersectionResult.java
@@ 0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.text.similarity;
+
+import java.util.Objects;
+
+/**
+ * Container class to store the intersection results between two sets.
+ *
+ * <p>Stores the size of set A, set B and the intersection of A and B (<code>A
∩ B</code>).
+ * The result can be used to produce various similarity metrics, for example the Jaccard
index or
+ * SørensenDice coefficient (F1 score).</p>
+ *
+ * <p>This class is immutable.</p>
+ *
+ * @since 1.7
+ * @see <a href="https://en.wikipedia.org/wiki/Jaccard_index">Jaccard index</a>
+ * @see <a href="http://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient">Sørensen
Dice coefficient</a>
+ * @see <a href="https://en.wikipedia.org/wiki/F1_score">F1 score</a>
+ */
+public class IntersectionResult {
+ /**
+ * The size of set A.
+ */
+ private final int sizeA;
+ /**
+ * The size of set B.
+ */
+ private final int sizeB;
+ /**
+ * The size of the intersection between set A and B.
+ */
+ private final int intersection;
+
+ /**
+ * Create the results for an intersection between two sets.
+ *
+ * @param sizeA the size of set A ({@code A})
+ * @param sizeB the size of set B ({@code B})
+ * @param intersection the size of the intersection of A and B (<code>A ∩
B</code>)
+ * @throws IllegalArgumentException if the sizes are negative or the intersection is
greater
+ * than the minimum of the two set sizes
+ */
+ public IntersectionResult(final int sizeA, final int sizeB, final int intersection) {
+ if (sizeA < 0) {
+ throw new IllegalArgumentException("Set size A is not positive: " + sizeA);
+ }
+ if (sizeB < 0) {
+ throw new IllegalArgumentException("Set size B is not positive: " + sizeB);
+ }
+ if (intersection < 0  intersection > Math.min(sizeA, sizeB)) {
+ throw new IllegalArgumentException("Invalid intersection of A and B: " +
intersection);
+ }
+ this.sizeA = sizeA;
+ this.sizeB = sizeB;
+ this.intersection = intersection;
+ }
+
+ /**
+ * Get the size of set A.
+ *
+ * @return A
+ */
+ public int getSizeA() {
+ return sizeA;
+ }
+
+ /**
+ * Get the size of set B.
+ *
+ * @return B
+ */
+ public int getSizeB() {
+ return sizeB;
+ }
+
+ /**
+ * Get the size of the intersection between set A and B.
+ *
+ * @return <code>A ∩ B</code>
+ */
+ public int getIntersection() {
+ return intersection;
+ }
+ /**
+ * Get the size of the union between set A and B.
+ *
+ * @return <code>A ∪ B</code>
+ */
+ public long getUnion() {
+ return (long) sizeA + sizeB  intersection;
+ }
+
+ /**
+ * Gets the Jaccard index. The Jaccard is the intersection divided by the union.
+ *
+ * <pre><code>A ∩ B / A ∪ B </code></pre>
+ *
+ * <p>This implementation defines the result as zero if there is no intersection,
+ * even when the union is zero to avoid a {@link Double#NaN} result.</p>
+ *
+ * @return the Jaccard index
+ * @see <a href="https://en.wikipedia.org/wiki/Jaccard_index">Jaccard index</a>
+ */
+ public double getJaccardIndex() {
+ return intersection == 0 ? 0.0 : (double) intersection / getUnion();
+ }
Review comment:
IMO this is the correct thing to do. Make Jaccard use this class.

> Add a generic SetSimilarity measure
> The {{SimilarityScore<T>}} interface can be used to compute a generic result. I
propose to add a class that can compute the intersection between two sets formed from the
characters. The sets must be formed from the {{CharSequence}} input to the {{apply}} method
using a {{Function<CharSequence, Set<T>>}} to convert the {{CharSequence}}. This
function can be passed to the {{SimilarityScore<T>}} during construction.
> The result can then be computed to have the size of each set and the intersection.
> I have created an implementation that can compute the equivalent of the {{JaccardSimilary}}
class by creating {{Set<Character>}} and also the F1score using bigrams (pairs of characters)
by creating {{Set<String>}}. This relates to [Text126https://issues.apache.org/jira/projects/TEXT/issues/TEXT126]
which suggested an algorithm for the SorensenDice similarity, also known as the F1score.
> Here is an example:
> {code:java}
> // Match the functionality of the JaccardSimilarity class
> Function<CharSequence, Set<Character>> converter = (cs) > {
> final Set<Character> set = new HashSet<>();
> for (int i = 0; i < cs.length(); i++) {
> set.add(cs.charAt(i));
> }
> return set;
> };
> IntersectionSimilarity<Character> similarity = new IntersectionSimilarity<>(converter);
> IntersectionResult result = similarity.apply("something", "something else");
> {code}
> The result has the size of set A, set B and the intersection between them.
> This class was inspired by my look through the various similarity implementations. All
of them except the {{CosineSimilarity}} perform single character matching between the input
{{CharSequence}}s. The {{CosineSimilarity}} tokenises using whitespace to create words.
> This more generic type of implementation will allow a user to determine how to divide
the {{CharSequence}} but to create the sets that are compared, e.g. single characters, words,
bigrams, etc.

