mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Collins <patrick.coll...@ready2sign.com>
Subject Re: Fuzzy matching
Date Sun, 01 May 2011 06:21:38 GMT
Should I be worried that somebody with a scientology.net email address is
writing in about address harvesting and data deduping?

Patrick.

On Fri, Apr 29, 2011 at 12:50 PM, James Pettyjohn <jamesp@scientology.net>wrote:

>
>
> Hey,
>
> First time writing in.
>
> I have around 6 million active records
> in a contacts database. Additional millions of history address records for
> these records. I got a new 60+ thousand records which are not correlated to
> these that I need to fuzzy match against both active and historical
> records.
>
> I will need to do the same thing with the database against
> itself for de-duplication later. The data is primarily in Oracle (with the
> supplement in csv's).
>
> I saw the Booz/Allen/Hamilton presentation on fuzzy
> matching - but I don't see any distributions for that implementation. At
> the same time I don't need real time query - just batch processing at the
> moment.
>
> I thought Mahout might fit the bill. Any comments on approach
> would be appreciated.
>
> Best, James

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message