commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From don jeba <donj...@yahoo.com.INVALID>
Subject [TEXT-2] Add Jaccard Index and Jaccard Distance
Date Wed, 16 Nov 2016 16:06:06 GMT
Hello,I am planning to work on this ticket TEXT-2. I need your guidance on naming/placing the
class file for implementing this.
The ask in the ticket is to get Jaccard Index [measures similarity] and Jaccard Distance [measures
dissimilarity].
Below is what I am planning to do.
Add a new class JaccardBase under package org.apache.commons.text, this will have logic to
calculate both the index and distance. As you know Jaccard distance is 1- jaccard index, so
there is no separate logic for each of it (index and distance), so planning to keep the calculation
logic in a common place.
Add a new class JaccardIndex under package org.apache.commons.text.similarity, this class
will be derived from JaccardBase and the class JaccardIndex will expose public function to
get the jaccard index.
Similar to the above a new class JaccardDistance under package org.apache.commons.text.diff,
this class will be derived from JaccardBase and the class JaccardDistance will expose public
function to get the jaccard distance.
The advantage is there is no code duplication.The disadvantage is, the caller wants both the
index and distance then, he/she needs to call 2 separate functions (one from JaccardIndex
class and one from JaccardDistance class) and we need to do the calculation twice for the
same set of input.

Another option is, have a single class which will return both the index and distance.With
this option, I have 2 questions1 where to keep the new class (under which package)2 what should
be the name the new class.The disadvantage is option 1 is fixed here.

I personally prefer option 1 as it looks more clean considering the way the classes are arranged
in the package.
Can you kindly review and comment on your thought.
Do let me know if I am not clear.
Thank you,
Regards,Don Jeba.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message