hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: The statistical spam filtering
Date Wed, 08 Oct 2008 07:28:52 GMT
Steve, Thanks for your information!!

I examined about the bayseian filtering, and I can easily test it on
the distributed system -- map/reduce is easy.

See http://blog.udanax.org/2008/10/parallel-bayesian-spam-filtering-using.html


On Mon, Sep 22, 2008 at 7:21 PM, Steve Loughran <stevel@apache.org> wrote:
> Edward J. Yoon wrote:
>> Hi all,
>> To reduce the efforts of the artificial management for planet-scale
>> mail service, I'm consider about the statistical spam filtering with
>> the SpamAssasin, Hadoop (distributed computing), Hama (parallel matrix
>> computing) projects.
>> Please any advice (or experience) !!
> Have you spoken to SpamAssassin? They'd probably love to get involved in a
> streams-based filtering system. One thing to know there is that a lot of
> their test data is private, as they have to include lots of legitimate email
> alongside the spam, so their big datasets aren't always that public.
> Talk to Justin Mason and the spamassassin developers
> -steve

Best regards, Edward J. Yoon

View raw message