commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <>
Subject [GitHub] [commons-text] kinow commented on issue #103: TEXT-126: Adding Sorensen-Dice similarity algoritham
Date Sat, 09 Mar 2019 04:24:33 GMT
kinow commented on issue #103: TEXT-126: Adding Sorensen-Dice similarity algoritham
   @ameyjadiye see last comment from @aherbert about empty strings and `0` vs. `1`.
   @aherbert while we are discussing #109 , do you think that is a blocker for this pull request?
So far I think at least the API proposed here would be kept right?
   If so, this could be merged once the last comment is resolved, and then we can discuss
how to organize the classes and where the sorensen-dice coefficient is calculated.
   I think the only thing missing is deciding on the name of the classes? Whether it should
use `Bigram` in the name or be just `SorensenDiceSimilarity`.
   I like the idea of having a descriptive name such as `BigramSorensenDiceSimilarity` (or
`Bigram` in other place/order). However, I think we should also considerate what users would
expect. i.e. in other libraries, does the Sorensen Dice similarity used is for bigrams always?
If other implementations Python/JS/Java in used bigrams, then we could leave it as `SorensenDiceSimilarity`
and either add another method/constructor/etc to customize the similarity, or then have another
   What do you think?

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message