hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: Is Apache Hama suitable for building a decision tree?
Date Wed, 10 Oct 2012 19:32:07 GMT
Hey Panos,

thanks for transferring this.

Here is the paper for the others:
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/de//pubs/archive/36296.pdf

I wanted to do this, not enough time :/
As I said on stackoverflow, I think the graph package is the wrong approach
here, you can clearly translate the mapreduce algorithm to BSP
and make use of the faster iterations.

Do you already have the code in MapReduce? I can simply turn this into BSP.
I would like to support the creation of random forests as well, by training
a decision tree in every task and combining them later.


2012/10/10 Panos Mandros <mandros.p@gmail.com>

> I currently have implemented in Hadoop, Google's framework for building
> decision trees (also known as PLANET). It is supposed to scale well in
> very large datasets. But it has many problems. It scales only well if
> the dataset has a few attributes. If a dataset has a lot of attributes,
> that means it will have a lot of map/reduce jobs which means a big
> start-up cost for all of these jobs. Google however uses it with a lot
> of modifications on its Hadoop like platform and not on the algorithm
> itself. PLANET starts with a single vertex and with map reduce jobs you
> add more and more until the tree is fully build.
>
> I have seen many times that Apache Hama is suitable for iterative
> algorithms like graphs. Can someone build a new graph with Hama or you
> just have as input a graph and make some computations on it? Will it be
> easy to transfer my project to Hama?? Thanks
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message