Return-Path: X-Original-To: apmail-incubator-hama-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-hama-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 136667A96 for ; Fri, 2 Sep 2011 15:08:18 +0000 (UTC) Received: (qmail 10674 invoked by uid 500); 2 Sep 2011 15:08:17 -0000 Delivered-To: apmail-incubator-hama-user-archive@incubator.apache.org Received: (qmail 10662 invoked by uid 500); 2 Sep 2011 15:08:17 -0000 Mailing-List: contact hama-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hama-user@incubator.apache.org Delivered-To: mailing list hama-user@incubator.apache.org Received: (qmail 10654 invoked by uid 99); 2 Sep 2011 15:08:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2011 15:08:17 +0000 X-ASF-Spam-Status: No, hits=4.0 required=5.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of thomas.jungblut@googlemail.com designates 209.85.212.47 as permitted sender) Received: from [209.85.212.47] (HELO mail-vw0-f47.google.com) (209.85.212.47) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2011 15:08:12 +0000 Received: by vwe42 with SMTP id 42so2431385vwe.6 for ; Fri, 02 Sep 2011 08:07:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=myitNQ/W9u2qcVUevdQBwxR2tv+EnRgtCsEpz3Qa0HQ=; b=aQtO626dzIXp+7FVgzLqYauvGBVec9B2BlDs5Rs2PNUjMGB6CU513S+c7Zd1uXtGGf tBdpbYUsOPSsthT68hMIuxbIQ3O1e1CXh5uXgQWq7wlnoRFKiliLNLdes5WJtePTFCrZ rempJjyQBSKZM5S+h04Vp6C0PxzPpRHPGTem0= MIME-Version: 1.0 Received: by 10.52.173.172 with SMTP id bl12mr1153305vdc.242.1314976071921; Fri, 02 Sep 2011 08:07:51 -0700 (PDT) Received: by 10.52.184.163 with HTTP; Fri, 2 Sep 2011 08:07:51 -0700 (PDT) In-Reply-To: References: Date: Fri, 2 Sep 2011 17:07:51 +0200 Message-ID: Subject: Re: About SVM Stochastic Gradient Descent (SGD) on BSP model From: Thomas Jungblut To: hama-user@incubator.apache.org Content-Type: multipart/alternative; boundary=bcaec51b1659a7aea804abf6b9cc --bcaec51b1659a7aea804abf6b9cc Content-Type: text/plain; charset=ISO-8859-1 Hi Joe, great thank you very much for clarification. I love classification algorithms, so I'll be very interested in how you develop this. "Per se" you can translate every MapReduce algorithm to BSP, since BSP is an abstraction to MapReduce. E.G: Map Phase is a local computation phase, merging and sorting are the synchronization barrier (needs finish of all map tasks) and reducing is a computational phase again. On the english wikipedia is a good schema that shows how the workflow is. Actually you can make your map phase as well in BSP, but for the latest release 0.3.0 you have to write data sharding and partitioning for yourself. There are examples and blog posts that shows how code them. Your reduce step is depending on your implementation. Is there a single reducer which updates the whole classifier? Actually I wanted to implement a k-means clustering in BSP, but sadly I was very busy and have not too much time for it. It is quite similar to your algorithm. The Map step is calculating the distance between the current point and the centers and the reducer is going to update the centers. To provide you with a bit of information, I already rewritten an MapReduce graph algorithm to BSP. [1][2] These examples are without partitioning, I recently did an improvement to the partitioning algorithm. So it makes sense to checkout the current trunk and browse through the graph package and examples package. It contains improved partitioning as well as graph examples. HF and GL. If you need help, I'll be glad to help you. [1] http://codingwiththomas.blogspot.com/2011/04/graph-exploration-with-hadoop-mapreduce.html [2] http://codingwiththomas.blogspot.com/2011/04/graph-exploration-using-apache-hama-and.html 2011/9/2 Zhiyong Xie > Thank you Thomas! In short, SVM ( > http://en.wikipedia.org/wiki/Support_vector_machine) is a supervised > learning classifier described as optimization problem and solved by > gradient > descent approach (http://en.wikipedia.org/wiki/Stochastic_gradient_descent > ). > It is a iterative process, and kinda run a map/reduce pair per iteration. > Map to calculate the gradient value for each point, and reduce phase to > optimize the classifier. BSP model seems native for scientific and graph > processing in my mind, not figure out or find much info online for this > type > of application so far . > > Best, > Joe > > On Thu, Sep 1, 2011 at 10:36 AM, Thomas Jungblut < > thomas.jungblut@googlemail.com> wrote: > > > Hi Joe, > > > > for non-insiders, would you please clarify what SGD and SVM are? > > Then we could give you some tips how to implement them in BSP. > > > > Greetz, > > Thomas > > > > 2011/9/1 Zhiyong Xie > > > > > Hi there, > > > > > > May I ask whether anyone else have look into the SGD mapping on BSP > model > > > too? I'm investigating whether BSP model is a good candidate for > > > implementing distributed version of SVM SGD. > > > > > > Thanks! > > > Joe > > > -- > > > Joe (Zhiyong) Xie > > > Graduate Student > > > > > > > > > > > -- > > Thomas Jungblut > > Berlin > > > > mobile: 0170-3081070 > > > > business: thomas.jungblut@testberichte.de > > private: thomas.jungblut@gmail.com > > > > > > -- > Joe (Zhiyong) Xie > Graduate Student > -- Thomas Jungblut Berlin mobile: 0170-3081070 business: thomas.jungblut@testberichte.de private: thomas.jungblut@gmail.com --bcaec51b1659a7aea804abf6b9cc--