Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DD33F102F5 for ; Thu, 20 Mar 2014 10:59:56 +0000 (UTC) Received: (qmail 36433 invoked by uid 500); 20 Mar 2014 10:59:54 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 35533 invoked by uid 500); 20 Mar 2014 10:59:44 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 35520 invoked by uid 99); 20 Mar 2014 10:59:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Mar 2014 10:59:40 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dileepajayakody@gmail.com designates 209.85.192.50 as permitted sender) Received: from [209.85.192.50] (HELO mail-qg0-f50.google.com) (209.85.192.50) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Mar 2014 10:59:35 +0000 Received: by mail-qg0-f50.google.com with SMTP id q108so2050263qgd.9 for ; Thu, 20 Mar 2014 03:59:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=GFQ5x9Pj+CGQui2VxnoJWHdGPLJQ9U2zBQXFRNHFi6U=; b=ofzWKyxdbu4dqpdDrxaMsWgRnDb6CzfIBTT/XKi1k/8NK7SdPmkwWUZfDd0l9mqPdW u5PYDxDk5UmyobXRUn6rbwv/mZVOkL8LjLeQMmb94uMh0s3xCKAdPxD1p3/7v6S4yMDa eAtddk9siRPrigOEXA10Y2R506sMFEqEu1uEm8UX1lJKq3qjx5v5LpTA/7pBIn9/8te2 4xlxVknywicjSTYICX1SOgMruiX+Mfb5rYuwq707OxASoOTXT7gMadym0HlzyDDbObh7 PDlUVXm8s5mpsKPgCBBTLjjHabh8HtpMVqs1hEmLX11WhQNpcT8XZP88DXkxJ0/r5D4e dr+w== MIME-Version: 1.0 X-Received: by 10.140.49.207 with SMTP id q73mr1234906qga.103.1395313154702; Thu, 20 Mar 2014 03:59:14 -0700 (PDT) Received: by 10.224.109.10 with HTTP; Thu, 20 Mar 2014 03:59:14 -0700 (PDT) Date: Thu, 20 Mar 2014 16:29:14 +0530 Message-ID: Subject: Building a reputation analysis engine for email using Mahout From: Dileepa Jayakody To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=001a11351cd6efc50504f507a8b3 X-Virus-Checked: Checked by ClamAV on apache.org --001a11351cd6efc50504f507a8b3 Content-Type: text/plain; charset=ISO-8859-1 Hi All, My name is Dileepa Jayakody, a MSc research student from University of Moratuwa, Sri Lanka. My research project (ReputationBox) is about prediction the goodness of incoming emails (based on a calculated reputation score) by analysing previous email conversations, email correspondents and their interests etc. I think this will be more like a recommendation engine for emails to rate and classify incoming emails based on a reputation score. The basic flow of my application is as follows; 1. User authorizes my application : ReputationBox to connect to his mailbox to read email 2. ReputationBox performs an initial reputation-analysis process to build a reputation-index over the past emails imported as a batch. (This initial reputation-index will be used as the training-data to analyse new incoming emails) 3. New emails are polled/ pushed to ReputationBox server and reputation-analysis is performed real-time to predict the reputation. 4. Email reputation data is stored in the application 5. ReputationBox client web-app represents the reputation data of the new emails (based on the reputation data in the email the client could be implemented as a priority-inbox, spam-filter, email categorizer etc) I would like to seek advice on how to develop the reputation-analysis component of my application using Apache Mahout. I'm looking at the people, topic and the actions mentioned in an email to derive the reputation. This is the high level architecture diagram of ReputationBox system [1]. I also plan to deploy my application in Google AppEngine. Is Mahout GAE deployable? I'm also planning to use Apache Isis to develop ReputationBox as a domain-driven application. This is a proposed project for GSoC. For more information on my application please see the jira [2] Looking forward to your suggestions. Thanks, Dileepa [1] https://issues.apache.org/jira/secure/attachment/12634802/EmailReputationSystem_v2.png [2] https://issues.apache.org/jira/browse/ISIS-736 --001a11351cd6efc50504f507a8b3--