hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: Is Apache Hama suitable for building a decision tree?
Date Thu, 11 Oct 2012 18:03:11 GMT
So we talked about this and want to get it through within a few weeks, so
stay tuned. I will add a jira for that soon.

2012/10/11 Thomas Jungblut <thomas.jungblut@gmail.com>

> Yes that is great, I will help you with that.
>
>
> 2012/10/11 Panos Mandros <mandros.p@gmail.com>
>
>> Hey Thomas,
>>     implementing PLANET was part of my bachelor thesis. It works not only
>> for single label learning but for multi-label learning also, as this is
>> one
>> of the areas my professor is interested. It works fine but still has
>> things
>> that need to be done. One of these is to transfer it to Hama. Another
>> thing
>> is to find a more efficient way to transfer data from mappers to the
>> reducer because right now the output is really big. If you want we can
>> cooperate on this.
>>
>> 2012/10/10 Thomas Jungblut <thomas.jungblut@gmail.com>
>>
>> > Hey Panos,
>> >
>> > thanks for transferring this.
>> >
>> > Here is the paper for the others:
>> >
>> >
>> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/de//pubs/archive/36296.pdf
>> >
>> > I wanted to do this, not enough time :/
>> > As I said on stackoverflow, I think the graph package is the wrong
>> approach
>> > here, you can clearly translate the mapreduce algorithm to BSP
>> > and make use of the faster iterations.
>> >
>> > Do you already have the code in MapReduce? I can simply turn this into
>> BSP.
>> > I would like to support the creation of random forests as well, by
>> training
>> > a decision tree in every task and combining them later.
>> >
>> >
>> > 2012/10/10 Panos Mandros <mandros.p@gmail.com>
>> >
>> > > I currently have implemented in Hadoop, Google's framework for
>> building
>> > > decision trees (also known as PLANET). It is supposed to scale well in
>> > > very large datasets. But it has many problems. It scales only well if
>> > > the dataset has a few attributes. If a dataset has a lot of
>> attributes,
>> > > that means it will have a lot of map/reduce jobs which means a big
>> > > start-up cost for all of these jobs. Google however uses it with a lot
>> > > of modifications on its Hadoop like platform and not on the algorithm
>> > > itself. PLANET starts with a single vertex and with map reduce jobs
>> you
>> > > add more and more until the tree is fully build.
>> > >
>> > > I have seen many times that Apache Hama is suitable for iterative
>> > > algorithms like graphs. Can someone build a new graph with Hama or you
>> > > just have as input a graph and make some computations on it? Will it
>> be
>> > > easy to transfer my project to Hama?? Thanks
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message