commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [commons-text] aherbert commented on issue #109: TEXT-155: Add a generic OverlapSimilarity measure
Date Sun, 10 Mar 2019 13:36:42 GMT
aherbert commented on issue #109: TEXT-155: Add a generic OverlapSimilarity measure
URL: https://github.com/apache/commons-text/pull/109#issuecomment-471306795
 
 
   I have tried to clean up the history into a single commit.
   
   I have changed the name back to `IntersectionSimilarity` as it was pointed out to me that
`Overlap` has a specific meaning in the combinatorics on words space, an “overlap” is
a specific repeated pattern. Also there is an [OverlapCoefficient](https://en.wikipedia.org/wiki/Overlap_coefficient)
between sets which is the intersection over the min size of the two sets.
   
   I dropped the computation of the metrics and the union from the `IntersectionResult`. This
class now has no logic but just holds data.
   
   I removed the use of streams and use a classic iteration over the smaller of the two sets
of keys to get the intersection.
   
   This is now a generic set similarity which just requires a function to split up a `CharSequence`.
A place to provide such functions, as contained in the example units test, is best left to
another block of new functionality.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message