hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Yakshin <greycat.na....@gmail.com>
Subject Re: Hadoop: Divide and Conquer Algorithms
Date Sun, 28 Feb 2010 21:00:32 GMT

>                I have a small question. I want to know how would one implement
> divide and conquer algorithms in Hadoop. For example suppose I want to implement
> merge sort 100 lines in hadoop. There will be 10 mapper each sorting 10 lines.
> Now comes the tough part
> In the traditional version of merge sort each piece of 10 lines is combined to
> form 5 pieces of 20 lines. The each piece of 20 lines is combined to form 3
> pieces of 40 lines and so on. I am unable to understand how to implement this
> functionality in the reducer.
> Any help would be welcome

You don't have to implement merge sort in reducer - it happens sort of
automatically between mapper and reducer. All the data gets sorted due
to grouping / partitioning needed to run reducers.

If you really mean some more generic "divide and conquer" problem,
then, I guess, Hadoop is not a best choice for parallelizing it. You
might want to solve it using several sequential Hadoop jobs, but it's
not very effective and it's pretty limited in terms of scalability.

WBR, Mikhail Yakshin

View raw message