Return-Path: X-Original-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 05E2F765B for ; Fri, 9 Dec 2011 07:21:42 +0000 (UTC) Received: (qmail 94176 invoked by uid 500); 9 Dec 2011 07:21:41 -0000 Delivered-To: apmail-incubator-giraph-user-archive@incubator.apache.org Received: (qmail 94102 invoked by uid 500); 9 Dec 2011 07:21:41 -0000 Mailing-List: contact giraph-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: giraph-user@incubator.apache.org Delivered-To: mailing list giraph-user@incubator.apache.org Received: (qmail 94094 invoked by uid 99); 9 Dec 2011 07:21:41 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Dec 2011 07:21:41 +0000 Received: from localhost (HELO achingmbp15.local) (127.0.0.1) (smtp-auth username aching, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Dec 2011 07:21:41 +0000 Message-ID: <4EE1B704.2000600@apache.org> Date: Thu, 08 Dec 2011 23:21:40 -0800 From: Avery Ching User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: giraph-user@incubator.apache.org Subject: Re: Comparing BSP and MR References: In-Reply-To: Content-Type: multipart/alternative; boundary="------------010909070408080203090600" This is a multi-part message in MIME format. --------------010909070408080203090600 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Praveen, Answers inline. Hope that helps! Avery On 12/8/11 10:16 PM, Praveen Sripati wrote: > Hi, > > I know about MapReduce/Hadoop and trying to get myself around > BSP/Hama-Giraph by comparing MR and BSP. > > - Map Phase in MR is similar to Computation Phase in BSP. BSP allows > for process to exchange data in the communication phase, but there is > no communication between the mappers in the Map Phase. Though the data > flows from Map tasks to Reducer tasks. Please correct me if I am > wrong. Any other significant differences? I suppose you can think of it that way. I like to compare a BSP superstep to a MapReduce job since it's computation and communication. > - After going through the documentation for Hama and Giraph, noticed > that they both use Hadoop as the underlying framework. In both Hama > and Giraph an MR Job is submitted. Does each superstep in BSP > correspond to a Job in MR? Where are the incoming, outgoing messages > and state stored - HDFS or HBase or Local or pluggable? > My understanding of Hama is that they have their own BSP framework. Giraph can be run on a Hadoop installation, it does not have its own computational framework. A Giraph job is submitted to a Hadoop installation as a Map-only job. Hama will have its own BSP lauching framework. In Giraph, the state is stored all in memory. Graphs are loaded/stored through VertexInputFormat/VertexOutputFormat (very similar to Hadoop). You could implement your own VertexInputFormat/VertexOutputFormat to use HDFS, HBase, etc. as your graph stable storage. > - If a Vertex is deactivated and again activated after receiving a > message, does is run on the same node or a different node in the cluster? > In Giraph, vertices can move around workers between supersteps. A vertex will run on the worker that it is assigned to. > Regards, > Praveen --------------010909070408080203090600 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi Praveen,

Answers inline.  Hope that helps!

Avery

On 12/8/11 10:16 PM, Praveen Sripati wrote:
Hi,

I know about MapReduce/Hadoop and trying to get myself around BSP/Hama-Giraph by comparing MR and BSP.

- Map Phase in MR is similar to Computation Phase in BSP. BSP allows for process to exchange data in the communication phase, but there is no communication between the mappers in the Map Phase. Though the data flows from Map tasks to Reducer tasks. Please correct me if I am wrong. Any other significant differences?

I suppose you can think of it that way.  I like to compare a BSP superstep to a MapReduce job since it's computation and communication.
- After going through the documentation for Hama and Giraph, noticed that they both use Hadoop as the underlying framework. In both Hama and Giraph an MR Job is submitted. Does each superstep in BSP correspond to a Job in MR? Where are the incoming, outgoing messages and state stored - HDFS or HBase or Local or pluggable?

My understanding of Hama is that they have their own BSP framework.  Giraph can be run on a Hadoop installation, it does not have its own computational framework.  A Giraph job is submitted to a Hadoop installation as a Map-only job.  Hama will have its own BSP lauching framework. 

In Giraph, the state is stored all in memory.  Graphs are loaded/stored through VertexInputFormat/VertexOutputFormat (very similar to Hadoop).  You could implement your own VertexInputFormat/VertexOutputFormat to use HDFS, HBase, etc. as your graph stable storage.

- If a Vertex is deactivated and again activated after receiving a message, does is run on the same node or a different node in the cluster?

In Giraph, vertices can move around workers between supersteps.  A vertex will run on the worker that it is assigned to.

Regards,
Praveen

--------------010909070408080203090600--