cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ugo Cei <>
Subject Re: [OT] Determining the similarity between a pair of texts
Date Wed, 15 Jun 2005 15:26:17 GMT
Il giorno 15/giu/05, alle 16:32, Tony Collen ha scritto:

> Ugo,
> I think what you're looking for is the Levenshtein Distance Algorithm.
> hl=en&q=java+Levenshtein+implementation&btnG=Google+Search

Nice! I also found an implementation nearby: 


However, this algorithm is useful for finding single-character  
differences, whereas I am more interested in word differences. IOW, the  
LD between "test" and "tent" is 1 and the LD between "test" and "barf"  
is 4, but for my purpose it should be 1 in both cases. And the LD  
between "test case" and "tent base" is smaller than the one between  
"test case" and "case under test", but I need it to be the reverse.

Actually, what I am trying to come up is an algorithm for determining  
whether two texts refer (more or less) about similar subjects.


Ugo Cei
Tech Blog:
Open Source Zone:
Wine & Food Blog:

View raw message