Mailing-List: contact giraph-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: giraph-user@incubator.apache.org
Message-ID: <4EE1B704.2000600@apache.org>
Date: Thu, 08 Dec 2011 23:21:40 -0800
From: Avery Ching <aching@apache.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6;
 rv:8.0) Gecko/20111105 Thunderbird/8.0
MIME-Version: 1.0
To: giraph-user@incubator.apache.org
Subject: Re: Comparing BSP and MR
References: 
 <CADYHM8zOJv0UgK1T+0-ehO5V15CXupcGyOVSDMzBx4Rx5nnBbg@mail.gmail.com>
In-Reply-To: 
 <CADYHM8zOJv0UgK1T+0-ehO5V15CXupcGyOVSDMzBx4Rx5nnBbg@mail.gmail.com>
Content-Type: multipart/alternative;
 boundary="------------010909070408080203090600"

This is a multi-part message in MIME format.
--------------010909070408080203090600
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi Praveen,

Answers inline.  Hope that helps!

Avery

On 12/8/11 10:16 PM, Praveen Sripati wrote:
> Hi,
>
> I know about MapReduce/Hadoop and trying to get myself around 
> BSP/Hama-Giraph by comparing MR and BSP.
>
> - Map Phase in MR is similar to Computation Phase in BSP. BSP allows 
> for process to exchange data in the communication phase, but there is 
> no communication between the mappers in the Map Phase. Though the data 
> flows from Map tasks to Reducer tasks. Please correct me if I am 
> wrong. Any other significant differences?
I suppose you can think of it that way.  I like to compare a BSP 
superstep to a MapReduce job since it's computation and communication.
> - After going through the documentation for Hama and Giraph, noticed 
> that they both use Hadoop as the underlying framework. In both Hama 
> and Giraph an MR Job is submitted. Does each superstep in BSP 
> correspond to a Job in MR? Where are the incoming, outgoing messages 
> and state stored - HDFS or HBase or Local or pluggable?
>
My understanding of Hama is that they have their own BSP framework.  
Giraph can be run on a Hadoop installation, it does not have its own 
computational framework.  A Giraph job is submitted to a Hadoop 
installation as a Map-only job.  Hama will have its own BSP lauching 
framework.

In Giraph, the state is stored all in memory.  Graphs are loaded/stored 
through VertexInputFormat/VertexOutputFormat (very similar to Hadoop).  
You could implement your own VertexInputFormat/VertexOutputFormat to use 
HDFS, HBase, etc. as your graph stable storage.

> - If a Vertex is deactivated and again activated after receiving a 
> message, does is run on the same node or a different node in the cluster?
>
In Giraph, vertices can move around workers between supersteps.  A 
vertex will run on the worker that it is assigned to.

> Regards,
> Praveen


--------------010909070408080203090600
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Hi Praveen,<br>
    <br>
    Answers inline.&nbsp; Hope that helps!<br>
    <br>
    Avery<br>
    <br>
    On 12/8/11 10:16 PM, Praveen Sripati wrote:
    <blockquote
cite="mid:CADYHM8zOJv0UgK1T+0-ehO5V15CXupcGyOVSDMzBx4Rx5nnBbg@mail.gmail.com"
      type="cite"><font size="2"><font face="verdana,sans-serif"><span
            style="font-family: verdana,sans-serif;">Hi,</span><br
            style="font-family: verdana,sans-serif;">
          <br style="font-family: verdana,sans-serif;">
          <span style="font-family: verdana,sans-serif;">I know about
            MapReduce/Hadoop and trying to get myself around
            BSP/Hama-Giraph by comparing MR and BSP.</span><br
            style="font-family: verdana,sans-serif;">
          <br style="font-family: verdana,sans-serif;">
          <span style="font-family: verdana,sans-serif;">- Map Phase in
            MR is similar to Computation Phase in BSP. BSP allows for
            process to exchange data in the communication phase, but
            there is no communication between the mappers in the Map
            Phase. Though the data flows from Map tasks to Reducer
            tasks. Please correct me if I am wrong. Any other
            significant differences?</span></font></font><br>
    </blockquote>
    I suppose you can think of it that way.&nbsp; I like to compare a BSP
    superstep to a MapReduce job since it's computation and
    communication.<br>
    <blockquote
cite="mid:CADYHM8zOJv0UgK1T+0-ehO5V15CXupcGyOVSDMzBx4Rx5nnBbg@mail.gmail.com"
      type="cite"><font size="2"><font face="verdana,sans-serif"><span
            style="font-family: verdana,sans-serif;">- After going
            through the documentation for Hama and Giraph, noticed that
            they both use Hadoop as the underlying framework. In both
            Hama and Giraph an MR Job is submitted. Does each superstep
            in BSP correspond to a Job in MR? Where are the incoming,
            outgoing messages and state stored - HDFS or HBase or Local
            or pluggable?</span><br style="font-family:
            verdana,sans-serif;">
          <br style="font-family: verdana,sans-serif;">
        </font></font></blockquote>
    My understanding of Hama is that they have their own BSP framework.&nbsp;
    Giraph can be run on a Hadoop installation, it does not have its own
    computational framework.&nbsp; A Giraph job is submitted to a Hadoop
    installation as a Map-only job.&nbsp; Hama will have its own BSP lauching
    framework.&nbsp; <br>
    <br>
    In Giraph, the state is stored all in memory.&nbsp; Graphs are
    loaded/stored through VertexInputFormat/VertexOutputFormat (very
    similar to Hadoop).&nbsp; You could implement your own
    VertexInputFormat/VertexOutputFormat to use HDFS, HBase, etc. as
    your graph stable storage.<br>
    <br>
    <blockquote
cite="mid:CADYHM8zOJv0UgK1T+0-ehO5V15CXupcGyOVSDMzBx4Rx5nnBbg@mail.gmail.com"
      type="cite"><font size="2"><font face="verdana,sans-serif"><span
            style="font-family: verdana,sans-serif;">- If a Vertex is
            deactivated and again activated after receiving a message,
            does is run on the same node or a different node in the
            cluster?</span><br style="font-family: verdana,sans-serif;">
          <br style="font-family: verdana,sans-serif;">
        </font></font></blockquote>
    In Giraph, vertices can move around workers between supersteps.&nbsp; A
    vertex will run on the worker that it is assigned to.<br>
    <br>
    <blockquote
cite="mid:CADYHM8zOJv0UgK1T+0-ehO5V15CXupcGyOVSDMzBx4Rx5nnBbg@mail.gmail.com"
      type="cite"><font size="2"><font face="verdana,sans-serif"><span
            style="font-family: verdana,sans-serif;">Regards,</span><br
            style="font-family: verdana,sans-serif;">
          <span style="font-family: verdana,sans-serif;">Praveen</span><br>
        </font></font>
    </blockquote>
    <br>
  </body>
</html>

--------------010909070408080203090600--