hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: Is Apache Hama suitable for building a decision tree?
Date Thu, 11 Oct 2012 16:50:48 GMT
Yes that is great, I will help you with that.

2012/10/11 Panos Mandros <mandros.p@gmail.com>

> Hey Thomas,
>     implementing PLANET was part of my bachelor thesis. It works not only
> for single label learning but for multi-label learning also, as this is one
> of the areas my professor is interested. It works fine but still has things
> that need to be done. One of these is to transfer it to Hama. Another thing
> is to find a more efficient way to transfer data from mappers to the
> reducer because right now the output is really big. If you want we can
> cooperate on this.
>
> 2012/10/10 Thomas Jungblut <thomas.jungblut@gmail.com>
>
> > Hey Panos,
> >
> > thanks for transferring this.
> >
> > Here is the paper for the others:
> >
> >
> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/de//pubs/archive/36296.pdf
> >
> > I wanted to do this, not enough time :/
> > As I said on stackoverflow, I think the graph package is the wrong
> approach
> > here, you can clearly translate the mapreduce algorithm to BSP
> > and make use of the faster iterations.
> >
> > Do you already have the code in MapReduce? I can simply turn this into
> BSP.
> > I would like to support the creation of random forests as well, by
> training
> > a decision tree in every task and combining them later.
> >
> >
> > 2012/10/10 Panos Mandros <mandros.p@gmail.com>
> >
> > > I currently have implemented in Hadoop, Google's framework for building
> > > decision trees (also known as PLANET). It is supposed to scale well in
> > > very large datasets. But it has many problems. It scales only well if
> > > the dataset has a few attributes. If a dataset has a lot of attributes,
> > > that means it will have a lot of map/reduce jobs which means a big
> > > start-up cost for all of these jobs. Google however uses it with a lot
> > > of modifications on its Hadoop like platform and not on the algorithm
> > > itself. PLANET starts with a single vertex and with map reduce jobs you
> > > add more and more until the tree is fully build.
> > >
> > > I have seen many times that Apache Hama is suitable for iterative
> > > algorithms like graphs. Can someone build a new graph with Hama or you
> > > just have as input a graph and make some computations on it? Will it be
> > > easy to transfer my project to Hama?? Thanks
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message