hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@googlemail.com>
Subject Re: compared with MapReduce ,what is the advantage of HAMA?
Date Sat, 24 Sep 2011 09:44:29 GMT
Thanks for your tips, I transfer this to our dev-list for discussion.

2011/9/24 changguanghui <changguanghui@huawei.com>

> I think,maybe, It is important to find some algorithm or some problem which
> is more suitable for using HAMA. Then, people can observe the contrast to
> the results between HAMA and MapReduce. Because more people want to know why
> they should choose HAMA, when they should choose HAMA.....
>  -----邮件原件-----
> 发件人: Thomas Jungblut [mailto:thomas.jungblut@googlemail.com]
> 发送时间: 2011年9月23日 19:39
> 收件人: hama-user@incubator.apache.org
> 主题: Re: compared with MapReduce ,what is the advantage of HAMA?
> Hi,
> to clearly state the advantage: you have less overhead.
> Let me illustrate an algorithm for mindist search, I renamed it to graph
> exploration. This will apply on Shortest Paths, too.
> I wrote about it here:
> http://codingwiththomas.blogspot.com/2011/04/graph-exploration-with-hadoop-mapreduce.html
> Basically the algorithm groups the components of the graph and assigns the
> lowest key of the group as an identifier for the component.
> Usually you are solving graph problems with MapReduce with a technique
> called "Message Passing".
> So you are going to send messages to other vertices in every map step. Then
> you have to shuffle, sort and reduce the vertices to compute the result.
> This isn't done with a single iteration, so you have to chain several
> map/reduce jobs.
> For each iteration you inherit the overhead of sorting and shuffeling.
> Additional you have to do this on the disk.
> Hama provides a message passing interface, so you don't have to take care
> of
> writing each message to HDFS.
> Each iteration, which is in MapReduce a full job execution, is called a
> superstep in BSP.
> Each superstep is faster than a full job execution in Hadoop, because you
> don't have the overhead with spilling to disk, job setup, sorting and
> shuffeling.
> In addition you can put your whole graph into RAM, this will speed up the
> computation anyways. Hadoop does not offer this capability yet.
> But I want to point out some facts that are not positive though:
> Currently no benchmarks against Hadoop or other Frameworks like Giraph or
> GoldenORB exist, so we can't say: we are the best/fastest/coolest.
> And graph algorithms are a hard way to code. As you can see, I have written
> lots of code to get this running. That is because I have to take care of
> the
> partitioning, vertex messaging and IO stuff by myself.
> For that purpose we are going to release a Pregel API which makes the
> development of graph algorithms a lot more easier.
> You can get a sneak peek here:
> https://issues.apache.org/jira/browse/HAMA-409
> That was a lot of text, but I hope to clarify a lot.
> Best Regards,
> Thomas
> 2011/9/23 changguanghui <changguanghui@huawei.com>
> > Hi Thomas,
> >
> > Could you provide a concrete instance to illustrate the advantage of
> > when HAMA vs. MapReduce?
> >
> > For example,SSSP on HAMA vs. SSSP on MapReduce. So ,I can catch the idea
> of
> > HAMA quickly.
> >
> > Thank you very much!
> >
> > Changguanghui
> >

Thomas Jungblut

mobile: 0170-3081070

business: thomas.jungblut@testberichte.de
private: thomas.jungblut@gmail.com

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message