giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: Can I use Giraph on an application with two maps but no reduce?
Date Thu, 02 May 2013 06:09:00 GMT
The question is: do you have 100GB of main-memory? How big are your
messages going to be? How dense is the graph?
Although we have out-of-core facilities, it looks to me not like a typical
graph algorithm, and in particular not one that would particularly take
advantage of Giraph compared to MapReduce. This is because it has a low
number of iterations (two), and hence, in particular if you have memory
constraints, it could work out pretty easily with MapReduce. Also, it looks
to me like a map/reduce job, there the reducer could do the second
iterations, but I could miss some details. As far as load-balancing is
concerned, i guess it depends on your degree distribution. Having a
"random" distribution of vertices through hash-partitioning should back you
up, but if you have a bunch of nodes that are much more active, you could
have some stranglers.


On Thu, May 2, 2013 at 2:12 AM, Hadoop Explorer
<hadoopexplorer@outlook.com>wrote:

> I have an application that evaluate a graph using this algorithm:
>
> - use a parallel for loop to evaluate all nodes in a graph (to evaluate a
> node, an image is read, and then result of this node is calculated)
>
> - use a second parallel for loop to evaluate all edges in the graph.  The
> function would take in results from both nodes of the edge, and then
> calculate the answer for the edge
>
> The final result will consist of calculated results of each edge.  So each
> node, and each edge is essentially a job, and in this case, an edge is more
> like a job than a message
>
> As you can see, the above algorithm would employ two map functions, but no
> reduce function.  The total data size can be very large (say 100GB).  Also,
> the workload of each node and each edge is highly irregular, and thus load
> balancing mechanisms are essential.
>
> In this case, will giraph suit this application?  if so, how will my
> program like?  And will giraph be able to strike the balance between a good
> load balancing of the second map function, and minimizing data transfer of
> the results from the first map function?
>
>
>


-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
View raw message