hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Port to YARN: GIRAPH and HAMA
Date Tue, 13 Sep 2011 17:47:19 GMT
Hi Vinod,

Edward and I have chatted about this at times.  It sounds better in 
theory (both BSP based and adding support for MRv2) than in practice I 
think (underlying implementations are quite different).  Actually, I 
also believe that in the future, Giraph is not going to solely be 
BSP-based graph computing.  We are also thinking about other underlying 
computing models (i.e. streaming (asynchronous) graph processing - see


But I think today, the issues are the following:

1)  Giraph runs completely as a MapReduce job on Hadoop today.  This 
needs to be maintained to support our current users, who will not likely 
move to MRv2 for at least a year.
2)  The internals of Giraph are implemented differently than Hama and 
would take some time to port to.
3)  If we have various graph processing computing models (BSP based, 
streams or asynchronous, or a combination), then being on Hama brings 
little value for Giraph.

Perhaps more practically, I wonder if it would be possible for someone 
from the Hama team to refactor our code a bit to support Hama-style BSP 
in Giraph?  Certainly would be a pretty cool project...


On 9/13/11 4:49 AM, Edward J. Yoon wrote:
> Quite a while ago, I implemented a clone of Google Pregel simply using
> BSPLib[1] and decided to focus on BSP computing engine.
> Hama and Giraph projects are differ in slogan but not in kind.
> If we made some collaboration, Giraph should be implemented on top of
> Hama BSP computing engine.
> Otherwise, we will back to square one again.
> 1. http://markmail.org/thread/4czcgtjupjvpqcqi
> On Sun, Sep 11, 2011 at 11:22 PM, Vinod Kumar Vavilapalli
> <vinodkv@hortonworks.com>  wrote:
>> Crosspost to hama-dev and giraph-dev.
>> It was only in my morning time that I was looking at HAMA-431, the port of
>> Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13
>> which is about porting Giraph to YARN.
>> I was also looking at the Girpah proposal for entry into Apache Incubator.
>> There is an interesting section there:
>> {quote}
>> Relationships with Other Apache Products
>> Giraph has some overlapping functionality with Apache Hama. However, there
>> are some significant differences. Giraph focuses on graph-based bulk
>> synchronous parallel (BSP) computing, while Apache Hama is more for general
>> purposed BSP computing. Giraph runs on the Hadoop infrastructure, while
>> Apache Hama uses its own computing framework.
>> {quote}
>> I agree with the point about Hama being a general purposed BSP and Giraph
>> being completely graph oriented. But the later one about the infrastructure
>> is going to be moot with both Giraph and Hama trying to be ported over to
>> YARN.
>> So here's my billion dollar question: Is it possible to implement Girpah's
>> graph based APIs over the Hama's bsp APIs which both run over a single
>> Apache BSP implementation over YARN?
>> I also do see the email thread regarding Hama and Giraph's future
>> collaboration when Hadoop NextGen aka YARN comes in:
>> http://s.apache.org/HamaVsGiraph. So are we ready for this yet?
>> Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs or
>> internals except that I see a bsp package in Giraph's source tree. I do know
>> a tiny bit about Hama's APIs and internal but my expertise is only two days.
>> Thanks,
>> +Vinod
>> (An elephant maintainer trying to see if a Giraffe can be made to ride over
>> a hippopotamus riding over an elephant)

View raw message