Return-Path: X-Original-To: apmail-incubator-giraph-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-giraph-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EC6FC8AA4 for ; Tue, 13 Sep 2011 17:47:21 +0000 (UTC) Received: (qmail 55007 invoked by uid 500); 13 Sep 2011 17:47:21 -0000 Delivered-To: apmail-incubator-giraph-dev-archive@incubator.apache.org Received: (qmail 54982 invoked by uid 500); 13 Sep 2011 17:47:21 -0000 Mailing-List: contact giraph-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: giraph-dev@incubator.apache.org Delivered-To: mailing list giraph-dev@incubator.apache.org Received: (qmail 54961 invoked by uid 99); 13 Sep 2011 17:47:21 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Sep 2011 17:47:21 +0000 Received: from localhost (HELO [0.0.0.0]) (127.0.0.1) (smtp-auth username aching, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Sep 2011 17:47:21 +0000 Message-ID: <4E6F9727.4000802@apache.org> Date: Tue, 13 Sep 2011 10:47:19 -0700 From: Avery Ching User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:6.0.2) Gecko/20110902 Thunderbird/6.0.2 MIME-Version: 1.0 To: giraph-dev@incubator.apache.org CC: "Edward J. Yoon" , hama-dev@incubator.apache.org Subject: Re: Port to YARN: GIRAPH and HAMA References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hi Vinod, Edward and I have chatted about this at times. It sounds better in theory (both BSP based and adding support for MRv2) than in practice I think (underlying implementations are quite different). Actually, I also believe that in the future, Giraph is not going to solely be BSP-based graph computing. We are also thinking about other underlying computing models (i.e. streaming (asynchronous) graph processing - see http://mail-archives.apache.org/mod_mbox/incubator-giraph-user/201109.mbox/%3CCAEVHzWC8b-7RiBjkDiQKjiu-rVBz9=ogEOajXHbCLCR5n3+QVg@mail.gmail.com%3E But I think today, the issues are the following: 1) Giraph runs completely as a MapReduce job on Hadoop today. This needs to be maintained to support our current users, who will not likely move to MRv2 for at least a year. 2) The internals of Giraph are implemented differently than Hama and would take some time to port to. 3) If we have various graph processing computing models (BSP based, streams or asynchronous, or a combination), then being on Hama brings little value for Giraph. Perhaps more practically, I wonder if it would be possible for someone from the Hama team to refactor our code a bit to support Hama-style BSP in Giraph? Certainly would be a pretty cool project... Avery On 9/13/11 4:49 AM, Edward J. Yoon wrote: > Quite a while ago, I implemented a clone of Google Pregel simply using > BSPLib[1] and decided to focus on BSP computing engine. > > Hama and Giraph projects are differ in slogan but not in kind. > > If we made some collaboration, Giraph should be implemented on top of > Hama BSP computing engine. > > Otherwise, we will back to square one again. > > 1. http://markmail.org/thread/4czcgtjupjvpqcqi > > On Sun, Sep 11, 2011 at 11:22 PM, Vinod Kumar Vavilapalli > wrote: >> Crosspost to hama-dev and giraph-dev. >> >> It was only in my morning time that I was looking at HAMA-431, the port of >> Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13 >> which is about porting Giraph to YARN. >> >> I was also looking at the Girpah proposal for entry into Apache Incubator. >> There is an interesting section there: >> {quote} >> Relationships with Other Apache Products >> >> Giraph has some overlapping functionality with Apache Hama. However, there >> are some significant differences. Giraph focuses on graph-based bulk >> synchronous parallel (BSP) computing, while Apache Hama is more for general >> purposed BSP computing. Giraph runs on the Hadoop infrastructure, while >> Apache Hama uses its own computing framework. >> {quote} >> >> I agree with the point about Hama being a general purposed BSP and Giraph >> being completely graph oriented. But the later one about the infrastructure >> is going to be moot with both Giraph and Hama trying to be ported over to >> YARN. >> >> So here's my billion dollar question: Is it possible to implement Girpah's >> graph based APIs over the Hama's bsp APIs which both run over a single >> Apache BSP implementation over YARN? >> >> I also do see the email thread regarding Hama and Giraph's future >> collaboration when Hadoop NextGen aka YARN comes in: >> http://s.apache.org/HamaVsGiraph. So are we ready for this yet? >> >> Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs or >> internals except that I see a bsp package in Giraph's source tree. I do know >> a tiny bit about Hama's APIs and internal but my expertise is only two days. >> >> Thanks, >> +Vinod >> (An elephant maintainer trying to see if a Giraffe can be made to ride over >> a hippopotamus riding over an elephant) >> > >