incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Comparing BSP and MR
Date Sat, 10 Dec 2011 17:38:04 GMT
You can certainly implement BSP on top of a MapReduce implementation.  
But this is going to be very very expensive.  Consider that all 
communication in MapReduce will go through the phase of storing map 
outputs locally (disk) before being send to the reducer.  Also, consider 
than the entire graph must be loaded and stored during each MapReduce 
job.  In Giraph, the graph is loaded once prior to the first superstep 
and stored once at the end of the last superstep.  All messaging is 
currently done in memory.

Another way to think about it would be that Giraph breaks down to a 
MapReduce implementation if you checkpoint every superstep (minus the 
re-loading of the graph).

Hope that helps,

Avery

On 12/10/11 9:26 AM, Praveen Sripati wrote:
> Avery,
>
> > Communication between mappers is not part of the MapReduce computing 
> model.  Therefore, it doesn't make sense for them to include it as it 
> would unnecessarily complicate the fault-tolerance recovery.
>
> I agree that it doesn't make sense to complicate things by introducing 
> communication between mappers.
>
> But, my original query was, why can't the RPC communication be avoided 
> with mappers in Giraph similar to MR by not running multiple 
> supersteps in a single map process. If I am not wrong Hadoop supports 
> MMR (multiple Maps) type of jobs and each map can map to the 
> computation phase in a single superstep. Agree, that there is an over 
> head of launching maps again and again, but communication between the 
> mappers can be avoided.
>
> I was trying to figure out the rational behind the approach taken in 
> Giraph.
>
> Regards,
> Praveen
>
> On Sat, Dec 10, 2011 at 10:44 PM, Avery Ching <aching@apache.org 
> <mailto:aching@apache.org>> wrote:
>
>     On 12/9/11 10:22 PM, Praveen Sripati wrote:
>>     Jack,
>>
>>     > Giraph maps do communicate: via RPC.  This is done repeatedly
>>     in every mapper, during the compute phase.  This is something
>>     that is not normal to MapReduce, it is special to Giraph.
>>
>>     There must have been definitely some thought around this. But, we
>>     can also have a mapper correspond to just the computation phase
>>     in a superstep and avoid communication between the mappers as in
>>     MapReduce. Later spawn another set of mappers for the next
>>     superset. There might be some reason why communication between
>>     mappers was avoided in MR.
>>
>     Communication between mappers is not part of the MapReduce
>     computing model.  Therefore, it doesn't make sense for them to
>     include it as it would unnecessarily complicate the
>     fault-tolerance recovery.
>
>
>>     Any thoughts?
>>
>>     Regards,
>>     Praveen
>>
>>     On Sat, Dec 10, 2011 at 10:35 AM, Jake Mannix
>>     <jake.mannix@gmail.com <mailto:jake.mannix@gmail.com>> wrote:
>>
>>
>>
>>         On Fri, Dec 9, 2011 at 8:16 PM, Praveen Sripati
>>         <praveensripati@gmail.com <mailto:praveensripati@gmail.com>>
>>         wrote:
>>
>>             Jake,
>>
>>
>>             > Let's not crosspost, please, it make the thread of
>>             conversation totally opaque as to who is talking about what.
>>
>>             Agree. I got it after the OP.
>>
>>
>>             > There is only one set of map tasks for the Giraph job -
>>             those long-running map tasks run possibly many supersteps.
>>
>>             OK. But, map tasks don't communicate with each other. How
>>             are messages sent across in the communication phase of a
>>             super step that happens within a map?
>>
>>
>>         Giraph maps do communicate: via RPC.  This is done repeatedly
>>         in every mapper, during the compute phase.  This is something
>>         that is not normal to MapReduce, it is special to Giraph.
>>
>>             > In Giraph, vertices can move around workers between
>>             supersteps.  A vertex will run on the worker that it is
>>             assigned to.
>>
>>             Is there any advantage of moving the processing of
>>             vertices from one worker to another. Can't there be
>>             affinity between a worker and the vertices it processes?
>>
>>
>>         Often there will be affinity, but if the graph itself evolves
>>         during computation (some sort of iterative pruning or
>>         clustering), then moving around may make sense.  Also: if
>>         nodes die.
>>           -jake
>>
>>
>>             Regards,
>>             Praveen
>>
>>             On Fri, Dec 9, 2011 at 11:33 PM, Jake Mannix
>>             <jake.mannix@gmail.com <mailto:jake.mannix@gmail.com>> wrote:
>>
>>                 [hama-user to bcc:]
>>
>>                 Let's not crosspost, please, it make the thread of
>>                 conversation totally opaque as to who is talking
>>                 about what.
>>
>>                 On Fri, Dec 9, 2011 at 1:42 AM, Praveen Sripati
>>                 <praveensripati@gmail.com
>>                 <mailto:praveensripati@gmail.com>> wrote:
>>
>>                     Thanks to Thomas and Avery for the response.
>>
>>                     > For Giraph you are quite correct, all the stuff
>>                     is submitted as a MR job. But a full map stage is
>>                     not a superstep, the whole computation is a done
>>                     in one mapping phase.
>>
>>                     So a map task in MR corresponds to a computation
>>                     phase in a superstep. Once the computation phase
>>                     for a superstep is complete, the vertex output is
>>                     stored using the defined OutputFormat, the
>>                     message sent (may be) to another vertex and the
>>                     map task is stopped. Once the barrier
>>                     synchronization phase is complete, another set of
>>                     map tasks are invoked for the vertices which have
>>                     received a message.
>>
>>
>>                 In Giraph, each superstep does not lead to storage
>>                 into an OutputFormat.  The data lives all in memory
>>                 from the time the first superstep starts to the time
>>                 the final superstep stops (except that for tolerance
>>                 of failures, checkpoints are stored to disk at
>>                 user-specified intervals).  There is only one set of
>>                 map tasks for the Giraph job - those long-running map
>>                 tasks run possibly many supersteps.
>>
>>                     In a regular MR Job (not Giraph) the number of
>>                     Map tasks equals to the number of InputSplits.
>>                     But, in case of Giraph the total number of maps
>>                     to be launched is usually more than the number of
>>                     input vertices.
>>
>>
>>                 Number of maps > number of input vertices?  Not at
>>                 all.  That would be insane.  We want to be able to
>>                 run over multi-billion vertex graphs.  We're going to
>>                 launch multiple billions of mappers?   The splitting
>>                 of the data in Giraph is very similar to in a regular
>>                 MR job, divide up your input data among the number of
>>                 mappers you have, and you're off and running.
>>
>>
>>                     > Where are the incoming, outgoing messages and
>>                     state stored
>>                     > Memory
>>
>>                     What happens if a particular node is lost in case
>>                     of Hama and Giraph? Are the messages not
>>                     persisted somewhere to be fetched later.
>>
>>
>>                 If nodes are lost, the system has to back up to the
>>                 most recent checkpoint, where graph state has been
>>                 persisted to HDFS.  Messages are not currently
>>                 persisted, but the state at which the graph was in to
>>                 produce any messages was.
>>
>>                     > In Giraph, vertices can move around workers
>>                     between supersteps.  A vertex will run on the
>>                     worker that it is assigned to.
>>
>>                     Is data locality considered while moving vertices
>>                     around workers in Giraph?
>>
>>
>>                 Data is all in memory, and typical graph algorithms
>>                 are basically sending roughly the size of the entire
>>                 graph (number of total edges) out over distributed
>>                 RPC in any given superstep, so shuffling the graph
>>                 around by RPC is not much more to do.
>>
>>
>>                     > As you can see, you could write a MapReduce
>>                     Engine with BSP on top of Apache Hama.
>>
>>                     It's being the done other way, BSP is implemented
>>                     in Giraph using Hadoop.
>>
>>
>>                 I'll let the Hama people explain to you about how one
>>                 would implement MR on top of Hama.  You are correct
>>                 that in Giraph, the Hadoop JobTracker/TaskTracker and
>>                 HDFS are used as substrate to help implement BSP
>>                 (although I would not say that "MR" is being used to
>>                 implement BSP, as there is no MR going on in Giraph).
>>
>>                   -jake
>>
>>
>>
>>                     Praveen
>>
>>                     On Fri, Dec 9, 2011 at 12:51 PM, Avery Ching
>>                     <aching@apache.org <mailto:aching@apache.org>> wrote:
>>
>>                         Hi Praveen,
>>
>>                         Answers inline.  Hope that helps!
>>
>>                         Avery
>>
>>                         On 12/8/11 10:16 PM, Praveen Sripati wrote:
>>>                         Hi,
>>>
>>>                         I know about MapReduce/Hadoop and trying to
>>>                         get myself around BSP/Hama-Giraph by
>>>                         comparing MR and BSP.
>>>
>>>                         - Map Phase in MR is similar to Computation
>>>                         Phase in BSP. BSP allows for process to
>>>                         exchange data in the communication phase,
>>>                         but there is no communication between the
>>>                         mappers in the Map Phase. Though the data
>>>                         flows from Map tasks to Reducer tasks.
>>>                         Please correct me if I am wrong. Any other
>>>                         significant differences?
>>                         I suppose you can think of it that way.  I
>>                         like to compare a BSP superstep to a
>>                         MapReduce job since it's computation and
>>                         communication.
>>>                         - After going through the documentation for
>>>                         Hama and Giraph, noticed that they both use
>>>                         Hadoop as the underlying framework. In both
>>>                         Hama and Giraph an MR Job is submitted. Does
>>>                         each superstep in BSP correspond to a Job in
>>>                         MR? Where are the incoming, outgoing
>>>                         messages and state stored - HDFS or HBase or
>>>                         Local or pluggable?
>>>
>>                         My understanding of Hama is that they have
>>                         their own BSP framework.  Giraph can be run
>>                         on a Hadoop installation, it does not have
>>                         its own computational framework.  A Giraph
>>                         job is submitted to a Hadoop installation as
>>                         a Map-only job.  Hama will have its own BSP
>>                         lauching framework.
>>
>>                         In Giraph, the state is stored all in
>>                         memory.  Graphs are loaded/stored through
>>                         VertexInputFormat/VertexOutputFormat (very
>>                         similar to Hadoop).  You could implement your
>>                         own VertexInputFormat/VertexOutputFormat to
>>                         use HDFS, HBase, etc. as your graph stable
>>                         storage.
>>
>>>                         - If a Vertex is deactivated and again
>>>                         activated after receiving a message, does is
>>>                         run on the same node or a different node in
>>>                         the cluster?
>>>
>>                         In Giraph, vertices can move around workers
>>                         between supersteps.  A vertex will run on the
>>                         worker that it is assigned to.
>>
>>>                         Regards,
>>>                         Praveen
>>
>>
>>
>>
>>
>>
>
>


Mime
View raw message