mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Approximate String matching
Date Thu, 25 Jun 2009 11:14:19 GMT
Could you setup the Lucene spell checker and use that?  It has  
pluggable distance measures, one being Edit distance.  You might have  
to implement your own variation to not do transposition.

On Jun 25, 2009, at 12:19 AM, prasenjit mukherjee wrote:

> Gents,
>    Please accept my apologies if you think this may not be the correct
> forum. I am trying to find a solution for approximate string  
> matching, where
> I need to find all strings from a corpus which differs from a given  
> pattern
> at most by  "d" number of operations. And the allowed "d" operations  
> are
> insertion, deletion, substitution. Yes I am not interested in  
> transposition
> as it could be very expensive.
>
> I looked into lingpipe they have a trie based solution in some class  
> called
> Aproximate*Chunker*. Any body has any better approach ?
>
> -Thanks,
> Prasenjit

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message