mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Document Comparison with Mahout
Date Thu, 08 Jul 2010 11:41:06 GMT

On Jul 8, 2010, at 2:21 AM, JAGANADH G wrote:

> On Wed, Jul 7, 2010 at 11:49 PM, Grant Ingersoll <gsingers@apache.org>wrote:
> 
>> How do you want to determine copy?  Strictly or loosely?  Solr and Nutch
>> have some deduplication capabilities, including fuzzy matching.  They
>> probably could be brought into Mahout, too.
>> 
>> -Grant
>> 
>> 
>> 
> Dear Grant
> I am trying to make a strict match.
> I will try Solar and Nutch.

So, then you can do a checksum or something like that, right?

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search


Mime
View raw message