Return-Path: X-Original-To: apmail-hama-dev-archive@www.apache.org Delivered-To: apmail-hama-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8B0C9C0ED for ; Fri, 25 May 2012 17:24:58 +0000 (UTC) Received: (qmail 11773 invoked by uid 500); 25 May 2012 17:24:58 -0000 Delivered-To: apmail-hama-dev-archive@hama.apache.org Received: (qmail 11755 invoked by uid 500); 25 May 2012 17:24:58 -0000 Mailing-List: contact dev-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list dev@hama.apache.org Received: (qmail 11746 invoked by uid 500); 25 May 2012 17:24:58 -0000 Delivered-To: apmail-incubator-hama-dev@incubator.apache.org Received: (qmail 11743 invoked by uid 99); 25 May 2012 17:24:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 May 2012 17:24:58 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of thomas.jungblut@googlemail.com designates 209.85.220.175 as permitted sender) Received: from [209.85.220.175] (HELO mail-vc0-f175.google.com) (209.85.220.175) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 May 2012 17:24:54 +0000 Received: by vcbfl15 with SMTP id fl15so657938vcb.6 for ; Fri, 25 May 2012 10:24:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=gO6lYYaDSR6Uufh+5XOO81ZDJM6cZoLyh7obeNm36Cw=; b=mx0N0c4eTgoRuGKMtthuUhBzBPKa8mX/87uAYdekoUoTRHP21hRpqsHWtRjMbFznXS xx/QPH6ker5oQQArW7Bg0HWmRSmt5x03p1+SJ+Yutjtdyby0LYrbO4IimUEZ7/qCVWx+ DEUUMb6y4wYelGWINDX+v7Wyh35aWgGZdzSo8XZklMUwPZLX9VSLIw7sXjmOlUOfL7Vz TEDZvvxWaoTqsn2Mj1XRg0Dh+qeWDi5NjIuAyp3gSGJCby00djSkZrwRFAQA4+Uu95TS 8v6mkBY/WIPpB01dkoh3duYKigEpYAZVhfzly5iFcGm7tNPpg6gh+hBX9COtra9O4kZN dACQ== MIME-Version: 1.0 Received: by 10.220.220.83 with SMTP id hx19mr4360549vcb.53.1337966673253; Fri, 25 May 2012 10:24:33 -0700 (PDT) Received: by 10.221.11.68 with HTTP; Fri, 25 May 2012 10:24:33 -0700 (PDT) In-Reply-To: References: <39738745-4AA7-4574-B338-F8AA3B796D59@gmx.de> Date: Fri, 25 May 2012 19:24:33 +0200 Message-ID: Subject: Re: Online machine learning on top of Hama BSP From: Thomas Jungblut To: dev@mahout.apache.org Cc: hama-dev@incubator.apache.org Content-Type: multipart/alternative; boundary=14dae9cfcba047d5ca04c0dfa486 X-Virus-Checked: Checked by ClamAV on apache.org --14dae9cfcba047d5ca04c0dfa486 Content-Type: text/plain; charset=ISO-8859-1 Hi Ted, Giraph offers a graph layer that uses internally BSP on top of MapReduce. You don't have access to the BSP primitives, therefore you need to treat every machine learning problem as graph problem which maybe very inconvenient in many cases. 2012/5/25 Ted Dunning > Apache Giraph probably offers a more mature BSP model of computation. My > guess is that it would make a stronger implementation substrate. It > certainly has a very strong community. > > On Fri, May 25, 2012 at 10:44 AM, Thomas Jungblut < > thomas.jungblut@googlemail.com> wrote: > > > Hi Manuel, > > > > 300k is small, I have one with 6 mio clicks. > > However it is more a question of interest and what algorithms could be > > suitable for BSP. > > In case you wonder what BSP is, it stands for bulk synchronous parallel > > [1]. > > We think that realtime and strongly iterative algorithms that are slow in > > mapreduce could be more efficiently solved with BSP. > > If you're interested, let us know. > > > > Regards, > > Thomas > > > > [1] http://en.wikipedia.org/wiki/Bulk_synchronous_parallel > > > > 2012/5/25 Manuel Blechschmidt > > > > > Hi Edward, > > > do you already have a test dataset? > > > > > > I might get one with about 300.000 clicks for you. > > > > > > It is from www.nelou.com and we are already running a recommender in > > > preview mode: > > > > > > http://www.nelou.com/artikel-803746/Overall-von-mysuro#__apaxoPreviewMode > > > > > > It could be the case that you would have to sign an NDA. Would this be > > > possible for you? > > > > > > /Manuel > > > > > > On 25.05.2012, at 10:34, Edward J. Yoon wrote: > > > > > > > OKay, I'm FWD this to mahout dev. > > > > > > > > I'm planning to create a project related to On-line machine learning, > > > > as a Apache Hama sub-module. Since the graph of message queues and > > > > workers could be implemented using BSP (see also [1]). The first idea > > > > is On-line recommendation system based on click-stream data. > > > > > > > > If you have interested in this plan, let's talk together here. > > > > > > > > 1. > > > > > > http://codingwiththomas.blogspot.com/2011/10/apache-hama-realtime-processing.html > > > > > > > > ---------- Forwarded message ---------- > > > > From: Thomas Jungblut > > > > Date: Fri, May 25, 2012 at 4:55 PM > > > > Subject: Re: Online machine learning on top of Hama BSP > > > > To: dev@hama.apache.org > > > > > > > > > > > > Should we cooperate with the Mahout guys on this? I'm pretty sure > they > > > > would have fun with it. > > > > Edward, do you want to ask them? > > > > > > > > 2012/5/25 Tommaso Teofili > > > > > > > >> Do you have a plan for that Edward? > > > >> A separate package in examples or a separate (online) machine > learning > > > >> module? Or something else? > > > >> Regards > > > >> Tommaso > > > >> > > > >> 2012/5/25 Edward J. Yoon > > > >> > > > >>> OKay, then let's get started. > > > >>> > > > >>> My first idea is simple online recommendation system based on > > > >> click-stream > > > >>> data. > > > >>> > > > >>> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati > > > >>> wrote: > > > >>>> +1 > > > >>>> > > > >>>> For those who are interested in ML, please check this. GNU Octave > is > > > >>> used. > > > >>>> > > > >>>> https://www.coursera.org/course/ml > > > >>>> > > > >>>> Another session is yet to be announced. > > > >>>> > > > >>>> Thanks, > > > >>>> Praveen > > > >>>> > > > >>>> On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut < > > > >>>> thomas.jungblut@googlemail.com> wrote: > > > >>>> > > > >>>>> +1 > > > >>>>> > > > >>>>> 2012/5/24 Tommaso Teofili > > > >>>>> > > > >>>>>> and same here :) > > > >>>>>> > > > >>>>>> 2012/5/24 Vaijanath Rao > > > >>>>>> > > > >>>>>>> +1 me too > > > >>>>>>> On May 23, 2012 10:26 PM, "Aditya Sarawgi" < > > > >>> sarawgi.aditya@gmail.com> > > > >>>>>>> wrote: > > > >>>>>>> > > > >>>>>>>> +1 > > > >>>>>>>> I would be happy to help :) > > > >>>>>>>> > > > >>>>>>>> On Wed, May 23, 2012 at 6:23 PM, Edward J. Yoon < > > > >>>>> edwardyoon@apache.org > > > >>>>>>>>> wrote: > > > >>>>>>>> > > > >>>>>>>>> Hi, > > > >>>>>>>>> > > > >>>>>>>>> Does anyone interesting in online machine learning? > > > >>>>>>>>> > > > >>>>>>>>> -- > > > >>>>>>>>> Best Regards, Edward J. Yoon > > > >>>>>>>>> @eddieyoon > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> -- > > > >>>>>>>> Cheers, > > > >>>>>>>> Aditya Sarawgi > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> -- > > > >>>>> Thomas Jungblut > > > >>>>> Berlin > > > >>>>> > > > >>> > > > >>> > > > >>> > > > >>> -- > > > >>> Best Regards, Edward J. Yoon > > > >>> @eddieyoon > > > >>> > > > >> > > > > > > > > > > > > > > > > -- > > > > Thomas Jungblut > > > > Berlin > > > > > > > > > > > > -- > > > > Best Regards, Edward J. Yoon > > > > @eddieyoon > > > > > > -- > > > Manuel Blechschmidt > > > Dortustr. 57 > > > 14467 Potsdam > > > Mobil: 0173/6322621 > > > Twitter: http://twitter.com/Manuel_B > > > > > > > > > > > > -- > > Thomas Jungblut > > Berlin > > > -- Thomas Jungblut Berlin --14dae9cfcba047d5ca04c0dfa486--