hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Shaposhnik <...@apache.org>
Subject Re: will an application with two maps but no reduce be suitable for hadoop?
Date Thu, 18 Apr 2013 15:29:59 GMT
On Thu, Apr 18, 2013 at 4:49 AM, Hadoop Explorer
<hadoopexplorer@outlook.com> wrote:
> I have an application that evaluate a graph using this algorithm:
> - use a parallel for loop to evaluate all nodes in a graph (to evaluate a
> node, an image is read, and then result of this node is calculated)
> - use a second parallel for loop to evaluate all edges in the graph.  The
> function would take in results from both nodes of the edge, and then
> calculate the answer for the edge
> As you can see, the above algorithm would employ two map functions, but no
> reduce function.  The total data size can be very large (say 100GB).  Also,
> the workload of each node and each edge is highly irregular, and thus load
> balancing mechanisms are essential.
> In this case, will hadoop suit this application?  if so, how will the
> architecture of my program like?  And will hadoop be able to strike the
> balance between a good load balancing of the second map function, and
> minimizing data transfer of the results from the first map function?

map-only jobs are known in Hadoop ecosystem. For example, that's how
Giraph implements BSP on top of Hadoop. In fact, from what you're
describing it sounds like Giraph could be a good fit. Check it out:


View raw message