incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Comparing BSP and MR
Date Sat, 10 Dec 2011 17:14:41 GMT
On 12/9/11 10:22 PM, Praveen Sripati wrote:
> Jack,
>
> > Giraph maps do communicate: via RPC.  This is done repeatedly in 
> every mapper, during the compute phase.  This is something that is not 
> normal to MapReduce, it is special to Giraph.
>
> There must have been definitely some thought around this. But, we can 
> also have a mapper correspond to just the computation phase in a 
> superstep and avoid communication between the mappers as in MapReduce. 
> Later spawn another set of mappers for the next superset. There might 
> be some reason why communication between mappers was avoided in MR.
>
Communication between mappers is not part of the MapReduce computing 
model.  Therefore, it doesn't make sense for them to include it as it 
would unnecessarily complicate the fault-tolerance recovery.

> Any thoughts?
>
> Regards,
> Praveen
>
> On Sat, Dec 10, 2011 at 10:35 AM, Jake Mannix <jake.mannix@gmail.com 
> <mailto:jake.mannix@gmail.com>> wrote:
>
>
>
>     On Fri, Dec 9, 2011 at 8:16 PM, Praveen Sripati
>     <praveensripati@gmail.com <mailto:praveensripati@gmail.com>> wrote:
>
>         Jake,
>
>
>         > Let's not crosspost, please, it make the thread of
>         conversation totally opaque as to who is talking about what.
>
>         Agree. I got it after the OP.
>
>
>         > There is only one set of map tasks for the Giraph job -
>         those long-running map tasks run possibly many supersteps.
>
>         OK. But, map tasks don't communicate with each other. How are
>         messages sent across in the communication phase of a super
>         step that happens within a map?
>
>
>     Giraph maps do communicate: via RPC.  This is done repeatedly in
>     every mapper, during the compute phase.  This is something that is
>     not normal to MapReduce, it is special to Giraph.
>
>         > In Giraph, vertices can move around workers between
>         supersteps.  A vertex will run on the worker that it is
>         assigned to.
>
>         Is there any advantage of moving the processing of vertices
>         from one worker to another. Can't there be affinity between a
>         worker and the vertices it processes?
>
>
>     Often there will be affinity, but if the graph itself evolves
>     during computation (some sort of iterative pruning or clustering),
>     then moving around may make sense.  Also: if nodes die.
>       -jake
>
>
>         Regards,
>         Praveen
>
>         On Fri, Dec 9, 2011 at 11:33 PM, Jake Mannix
>         <jake.mannix@gmail.com <mailto:jake.mannix@gmail.com>> wrote:
>
>             [hama-user to bcc:]
>
>             Let's not crosspost, please, it make the thread of
>             conversation totally opaque as to who is talking about what.
>
>             On Fri, Dec 9, 2011 at 1:42 AM, Praveen Sripati
>             <praveensripati@gmail.com
>             <mailto:praveensripati@gmail.com>> wrote:
>
>                 Thanks to Thomas and Avery for the response.
>
>                 > For Giraph you are quite correct, all the stuff is
>                 submitted as a MR job. But a full map stage is not a
>                 superstep, the whole computation is a done in one
>                 mapping phase.
>
>                 So a map task in MR corresponds to a computation phase
>                 in a superstep. Once the computation phase for a
>                 superstep is complete, the vertex output is stored
>                 using the defined OutputFormat, the message sent (may
>                 be) to another vertex and the map task is stopped.
>                 Once the barrier synchronization phase is complete,
>                 another set of map tasks are invoked for the vertices
>                 which have received a message.
>
>
>             In Giraph, each superstep does not lead to storage into an
>             OutputFormat.  The data lives all in memory from the time
>             the first superstep starts to the time the final superstep
>             stops (except that for tolerance of failures, checkpoints
>             are stored to disk at user-specified intervals).  There is
>             only one set of map tasks for the Giraph job - those
>             long-running map tasks run possibly many supersteps.
>
>                 In a regular MR Job (not Giraph) the number of Map
>                 tasks equals to the number of InputSplits. But, in
>                 case of Giraph the total number of maps to be launched
>                 is usually more than the number of input vertices.
>
>
>             Number of maps > number of input vertices?  Not at all.
>              That would be insane.  We want to be able to run over
>             multi-billion vertex graphs.  We're going to launch
>             multiple billions of mappers?   The splitting of the data
>             in Giraph is very similar to in a regular MR job, divide
>             up your input data among the number of mappers you have,
>             and you're off and running.
>
>
>                 > Where are the incoming, outgoing messages and state
>                 stored
>                 > Memory
>
>                 What happens if a particular node is lost in case of
>                 Hama and Giraph? Are the messages not persisted
>                 somewhere to be fetched later.
>
>
>             If nodes are lost, the system has to back up to the most
>             recent checkpoint, where graph state has been persisted to
>             HDFS.  Messages are not currently persisted, but the state
>             at which the graph was in to produce any messages was.
>
>                 > In Giraph, vertices can move around workers between
>                 supersteps.  A vertex will run on the worker that it
>                 is assigned to.
>
>                 Is data locality considered while moving vertices
>                 around workers in Giraph?
>
>
>             Data is all in memory, and typical graph algorithms are
>             basically sending roughly the size of the entire graph
>             (number of total edges) out over distributed RPC in any
>             given superstep, so shuffling the graph around by RPC is
>             not much more to do.
>
>
>                 > As you can see, you could write a MapReduce Engine
>                 with BSP on top of Apache Hama.
>
>                 It's being the done other way, BSP is implemented in
>                 Giraph using Hadoop.
>
>
>             I'll let the Hama people explain to you about how one
>             would implement MR on top of Hama.  You are correct that
>             in Giraph, the Hadoop JobTracker/TaskTracker and HDFS are
>             used as substrate to help implement BSP (although I would
>             not say that "MR" is being used to implement BSP, as there
>             is no MR going on in Giraph).
>
>               -jake
>
>
>
>                 Praveen
>
>                 On Fri, Dec 9, 2011 at 12:51 PM, Avery Ching
>                 <aching@apache.org <mailto:aching@apache.org>> wrote:
>
>                     Hi Praveen,
>
>                     Answers inline.  Hope that helps!
>
>                     Avery
>
>                     On 12/8/11 10:16 PM, Praveen Sripati wrote:
>>                     Hi,
>>
>>                     I know about MapReduce/Hadoop and trying to get
>>                     myself around BSP/Hama-Giraph by comparing MR and
>>                     BSP.
>>
>>                     - Map Phase in MR is similar to Computation Phase
>>                     in BSP. BSP allows for process to exchange data
>>                     in the communication phase, but there is no
>>                     communication between the mappers in the Map
>>                     Phase. Though the data flows from Map tasks to
>>                     Reducer tasks. Please correct me if I am wrong.
>>                     Any other significant differences?
>                     I suppose you can think of it that way.  I like to
>                     compare a BSP superstep to a MapReduce job since
>                     it's computation and communication.
>>                     - After going through the documentation for Hama
>>                     and Giraph, noticed that they both use Hadoop as
>>                     the underlying framework. In both Hama and Giraph
>>                     an MR Job is submitted. Does each superstep in
>>                     BSP correspond to a Job in MR? Where are the
>>                     incoming, outgoing messages and state stored -
>>                     HDFS or HBase or Local or pluggable?
>>
>                     My understanding of Hama is that they have their
>                     own BSP framework.  Giraph can be run on a Hadoop
>                     installation, it does not have its own
>                     computational framework.  A Giraph job is
>                     submitted to a Hadoop installation as a Map-only
>                     job.  Hama will have its own BSP lauching framework.
>
>                     In Giraph, the state is stored all in memory. 
>                     Graphs are loaded/stored through
>                     VertexInputFormat/VertexOutputFormat (very similar
>                     to Hadoop).  You could implement your own
>                     VertexInputFormat/VertexOutputFormat to use HDFS,
>                     HBase, etc. as your graph stable storage.
>
>>                     - If a Vertex is deactivated and again activated
>>                     after receiving a message, does is run on the
>>                     same node or a different node in the cluster?
>>
>                     In Giraph, vertices can move around workers
>                     between supersteps.  A vertex will run on the
>                     worker that it is assigned to.
>
>>                     Regards,
>>                     Praveen
>
>
>
>
>
>


Mime
View raw message