From issues-return-72523-archive-asf-public=cust-asf.ponee.io@commons.apache.org Sun Mar 10 13:36:44 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 33482180672 for ; Sun, 10 Mar 2019 14:36:44 +0100 (CET) Received: (qmail 21839 invoked by uid 500); 10 Mar 2019 13:36:43 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 21822 invoked by uid 99); 10 Mar 2019 13:36:43 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Mar 2019 13:36:43 +0000 From: GitBox To: issues@commons.apache.org Subject: [GitHub] [commons-text] aherbert commented on issue #109: TEXT-155: Add a generic OverlapSimilarity measure Message-ID: <155222500267.28703.10251132640233980044.gitbox@gitbox.apache.org> Date: Sun, 10 Mar 2019 13:36:42 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit aherbert commented on issue #109: TEXT-155: Add a generic OverlapSimilarity measure URL: https://github.com/apache/commons-text/pull/109#issuecomment-471306795 I have tried to clean up the history into a single commit. I have changed the name back to `IntersectionSimilarity` as it was pointed out to me that `Overlap` has a specific meaning in the combinatorics on words space, an “overlap” is a specific repeated pattern. Also there is an [OverlapCoefficient](https://en.wikipedia.org/wiki/Overlap_coefficient) between sets which is the intersection over the min size of the two sets. I dropped the computation of the metrics and the union from the `IntersectionResult`. This class now has no logic but just holds data. I removed the use of streams and use a classic iteration over the smaller of the two sets of keys to get the intersection. This is now a generic set similarity which just requires a function to split up a `CharSequence`. A place to provide such functions, as contained in the example units test, is best left to another block of new functionality. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services