Answers inline. Hope that helps!
On 12/8/11 10:16 PM, Praveen Sripati wrote:
I suppose you can think of it that way. I like to compare a BSP
superstep to a MapReduce job since it's computation and
I know about
MapReduce/Hadoop and trying to get myself around
BSP/Hama-Giraph by comparing MR and BSP.
- Map Phase in
MR is similar to Computation Phase in BSP. BSP allows for
process to exchange data in the communication phase, but
there is no communication between the mappers in the Map
Phase. Though the data flows from Map tasks to Reducer
tasks. Please correct me if I am wrong. Any other
- After going
through the documentation for Hama and Giraph, noticed that
they both use Hadoop as the underlying framework. In both
Hama and Giraph an MR Job is submitted. Does each superstep
in BSP correspond to a Job in MR? Where are the incoming,
outgoing messages and state stored - HDFS or HBase or Local
My understanding of Hama is that they have their own BSP framework.
Giraph can be run on a Hadoop installation, it does not have its own
computational framework. A Giraph job is submitted to a Hadoop
installation as a Map-only job. Hama will have its own BSP lauching
In Giraph, the state is stored all in memory. Graphs are
loaded/stored through VertexInputFormat/VertexOutputFormat (very
similar to Hadoop). You could implement your own
VertexInputFormat/VertexOutputFormat to use HDFS, HBase, etc. as
your graph stable storage.
- If a Vertex is
deactivated and again activated after receiving a message,
does is run on the same node or a different node in the
In Giraph, vertices can move around workers between supersteps. A
vertex will run on the worker that it is assigned to.