hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suraj Menon <surajsme...@apache.org>
Subject Re: Online machine learning on top of Hama BSP
Date Thu, 14 Jun 2012 19:51:42 GMT
Just adding my 2 cents. Thomas, this goes in line with the discussion we
had recently on how Hama should have a superstep library, where each
superstep does something that potential user (In this case, our machine
learning library) can override and use. Few ideas for superstep library:

1. RealTimeSuperstep (extends Superstep but does not sync)
2. MutualBroadcastSuperstep (extends Superstep; used where all the peers
have to send all their messages to each other. We should employ a peer
communication strategy such that every peer internally does not have to
open RPC connection with every other peer)
3. Mapper and Reducer(I have one WordCount test running for small set of
data. Will need more time to increase its scalability, the first step of
MapReduce would have to use MutualBroadCast.
4. OutputCommitter (a Superstep that would write output records to HDFS not
based on the peer ID)
5. IterativeSuperstep (that holds static information on every iteration and
checkpoints them)
6.. more expected as we work on new ideas.


-Suraj


On Thu, Jun 14, 2012 at 2:45 PM, Thomas Jungblut <
thomas.jungblut@googlemail.com> wrote:

> I have read a bit about batch neural networks and I think I have found a
> viable solution for BSP.
> The funny thing is, that it is the same intuition that my kmeans clustering
> has.
>
> Each task is processing on a local block of the data, training a full model
> for itself (making a forward pass and calculating the error of the output
> neurons against the prediction).
> Now after you have iterated over all the observations, you are going to
> send all the weights of your neurons and the error (let's say the average
> error over all observations) to all the other tasks.
> After sync, each tasks has #tasks weights for a neuron and the avg
> prediction error, now the weights are accumulated and the backward step
> with the error begins.
> When all weights are backpropagated on each task, you can start reading the
> whole observations again and make the next epoch. (until some minimum
> average error has been seen or maximum epochs has been reached).
>
> Don't know if that is a common pattern in machine learning, but seems to me
> like we can extract some kind of API that helps building local models and
> combining them again in the next superstep with more information (think of
> the Pregel API with compute, but not on vertex level but on task level).
>
> What do you think about that?
>
> 2012/6/14 Thomas Jungblut <thomas.jungblut@googlemail.com>
>
> > Very cool project, I just need a few vectors and matrices where I will
> use
> > my own library first.
> >
> > Still having a hard time to distribute the network and update it
> > accordingly in backprop. If you have smart ideas, let me know.
> >
> >
> > 2012/6/14 Tommaso Teofili <tommaso.teofili@gmail.com>
> >
> >> Hi Thomas,
> >> regarding neural networks I'm also working on it within Apache Yay (my
> >> Apache labs project [1]) and I agree it'd make sense to run neural
> network
> >> algorithms on top of Hama, however at this stage I've just a prototype
> in
> >> memory implementation for feedforward (no actual learning) neural
> >> networks.
> >> Apart from that I think we need a math/linear algebra package running on
> >> top of Hama to make those algorithms scale nicely.
> >> I agree we can start from batch and then switch to online machine
> learning
> >> algorithms.
> >> Regards,
> >> Tommaso
> >>
> >> [1] : http://svn.apache.org/repos/asf/labs/yay/trunk/
> >>
> >> 2012/6/13 Thomas Jungblut <thomas.jungblut@googlemail.com>
> >>
> >> > I'm going to focus still on batch learning, my next aim would be to
> try
> >> out
> >> > neural networks with BSP.
> >> >
> >> >
> >> >
> >>
> http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=685414&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2FFabs_all.jsp%2FFarnumber%2F685414
> >> >
> >> > http://techreports.cs.queensu.ca/files/1997-406.pdf
> >> >
> >> > Along with the pSVM we have then two strong learners. If you're
> >> interested,
> >> > pass me a private message. But I have to write a few exams next week
> so
> >> I'm
> >> > busy and this is just an idea, we'll see how fast I can get a
> prototye.
> >> >
> >> > Real time is difficult at the moment, we need the out of sync
> messaging.
> >> >
> >> > 2012/6/13 Edward J. Yoon <edwardyoon@apache.org>
> >> >
> >> > > Thank you for your sharing!
> >> > >
> >> > > On Wed, Jun 13, 2012 at 7:03 PM, Tommaso Teofili
> >> > > <tommaso.teofili@gmail.com> wrote:
> >> > > > following up with this discussion on our dev list, I found an
> >> > > introductory
> >> > > > pdf to online ML which may be useful [1]
> >> > > > Apart fromt that we can start by creating the module structure
in
> >> hama
> >> > > svn
> >> > > > (still the incubator one as the TLP move seems to take a while).
> >> > > > Regards,
> >> > > > Tommaso
> >> > > >
> >> > > > [1] :
> >> > http://www.springerlink.com/content/m480047m572t6262/fulltext.pdf
> >> > > >
> >> > > > 2012/5/25 Edward J. Yoon <edwardyoon@apache.org>
> >> > > >
> >> > > >> I'm roughly thinking to create new module so that I can add
3rd
> >> party
> >> > > >> dependencies easily.
> >> > > >>
> >> > > >> On Fri, May 25, 2012 at 4:36 PM, Tommaso Teofili
> >> > > >> <tommaso.teofili@gmail.com> wrote:
> >> > > >> > Do you have a plan for that Edward?
> >> > > >> > A separate package in examples or a separate (online)
machine
> >> > learning
> >> > > >> > module? Or something else?
> >> > > >> > Regards
> >> > > >> > Tommaso
> >> > > >> >
> >> > > >> > 2012/5/25 Edward J. Yoon <edwardyoon@apache.org>
> >> > > >> >
> >> > > >> >> OKay, then let's get started.
> >> > > >> >>
> >> > > >> >> My first idea is simple online recommendation system
based on
> >> > > >> click-stream
> >> > > >> >> data.
> >> > > >> >>
> >> > > >> >> On Thu, May 24, 2012 at 6:26 PM, Praveen Sripati
> >> > > >> >> <praveensripati@gmail.com> wrote:
> >> > > >> >> > +1
> >> > > >> >> >
> >> > > >> >> > For those who are interested in ML, please
check this. GNU
> >> Octave
> >> > > is
> >> > > >> >> used.
> >> > > >> >> >
> >> > > >> >> > https://www.coursera.org/course/ml
> >> > > >> >> >
> >> > > >> >> > Another session is yet to be announced.
> >> > > >> >> >
> >> > > >> >> > Thanks,
> >> > > >> >> > Praveen
> >> > > >> >> >
> >> > > >> >> > On Thu, May 24, 2012 at 12:54 PM, Thomas Jungblut
<
> >> > > >> >> > thomas.jungblut@googlemail.com> wrote:
> >> > > >> >> >
> >> > > >> >> >> +1
> >> > > >> >> >>
> >> > > >> >> >> 2012/5/24 Tommaso Teofili <tommaso.teofili@gmail.com>
> >> > > >> >> >>
> >> > > >> >> >> > and same here :)
> >> > > >> >> >> >
> >> > > >> >> >> > 2012/5/24 Vaijanath Rao <vaiju1981@gmail.com>
> >> > > >> >> >> >
> >> > > >> >> >> > > +1 me too
> >> > > >> >> >> > > On May 23, 2012 10:26 PM, "Aditya
Sarawgi" <
> >> > > >> >> sarawgi.aditya@gmail.com>
> >> > > >> >> >> > > wrote:
> >> > > >> >> >> > >
> >> > > >> >> >> > > > +1
> >> > > >> >> >> > > > I would be happy to help
:)
> >> > > >> >> >> > > >
> >> > > >> >> >> > > > On Wed, May 23, 2012 at
6:23 PM, Edward J. Yoon <
> >> > > >> >> >> edwardyoon@apache.org
> >> > > >> >> >> > > > >wrote:
> >> > > >> >> >> > > >
> >> > > >> >> >> > > > > Hi,
> >> > > >> >> >> > > > >
> >> > > >> >> >> > > > > Does anyone interesting
in online machine learning?
> >> > > >> >> >> > > > >
> >> > > >> >> >> > > > > --
> >> > > >> >> >> > > > > Best Regards, Edward
J. Yoon
> >> > > >> >> >> > > > > @eddieyoon
> >> > > >> >> >> > > > >
> >> > > >> >> >> > > >
> >> > > >> >> >> > > >
> >> > > >> >> >> > > >
> >> > > >> >> >> > > > --
> >> > > >> >> >> > > > Cheers,
> >> > > >> >> >> > > > Aditya Sarawgi
> >> > > >> >> >> > > >
> >> > > >> >> >> > >
> >> > > >> >> >> >
> >> > > >> >> >>
> >> > > >> >> >>
> >> > > >> >> >>
> >> > > >> >> >> --
> >> > > >> >> >> Thomas Jungblut
> >> > > >> >> >> Berlin <thomas.jungblut@gmail.com>
> >> > > >> >> >>
> >> > > >> >>
> >> > > >> >>
> >> > > >> >>
> >> > > >> >> --
> >> > > >> >> Best Regards, Edward J. Yoon
> >> > > >> >> @eddieyoon
> >> > > >> >>
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > >> --
> >> > > >> Best Regards, Edward J. Yoon
> >> > > >> @eddieyoon
> >> > > >>
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Best Regards, Edward J. Yoon
> >> > > @eddieyoon
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Thomas Jungblut
> >> > Berlin <thomas.jungblut@gmail.com>
> >> >
> >>
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <thomas.jungblut@gmail.com>
> >
>
>
>
> --
> Thomas Jungblut
> Berlin <thomas.jungblut@gmail.com>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message