commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Tompkins <chtom...@apache.org>
Subject Re: [text] Longest common subsequence wrong result?
Date Fri, 31 Mar 2017 11:34:29 GMT
Hello Sébastien,

From what I can tell this would be expected behaviour. I think this hinges on the definition
of “subsequence” differing from the definition of “substring.” By this I mean that
a subsequence to be an enumerated list of elements derived by deleting some (possibly zero)
elements from the original enumerated list. Whereas, a substring is an enumerated list of
characters derived by deleting some (possibly zero) elements from the original character list
and that our new character list were adjacent in the original list.

So, in your example of “Gandalf” and “Sauron” share the subsequence {a, n}. But, it
we were to restrict to substring, then the longest commons substring would simply be {a}.

I’ve tried to spell this out in the javadoc here (http://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/similarity/LongestCommonSubsequence.html#logestCommonSubsequence-java.lang.CharSequence-java.lang.CharSequence-),
but I suppose I should have been clearer in the documentation. 

Do let me know if you think there’s a way to better present this details.

Many thanks and all the best,
-Rob

> On Mar 31, 2017, at 7:16 AM, Sébastien Piller <me@sebpiller.ch> wrote:
> 
> Hi all,
> If I call
> new LongestCommonSubsequence ().apply ("xxx","yyy")
> I get 0 (correct)
> If I call 
> new LongestCommonSubsequence ().apply ("Gandalf","Sauron")
> I get 2 which looks incorrect to me (should have got 1 since there is no sequence of
2 chars on both strings. Is it a bug or an expected behavior?
> Thanks
> 
> Envoyé depuis mon smartphone Samsung Galaxy.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message