hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Yakshin <greycat.na....@gmail.com>
Subject Re: Hadoop: Divide and Conquer Algorithms
Date Sun, 28 Feb 2010 21:00:32 GMT
Hi,

>                I have a small question. I want to know how would one implement
> divide and conquer algorithms in Hadoop. For example suppose I want to implement
> merge sort 100 lines in hadoop. There will be 10 mapper each sorting 10 lines.
> Now comes the tough part
>
> In the traditional version of merge sort each piece of 10 lines is combined to
> form 5 pieces of 20 lines. The each piece of 20 lines is combined to form 3
> pieces of 40 lines and so on. I am unable to understand how to implement this
> functionality in the reducer.
>
> Any help would be welcome

You don't have to implement merge sort in reducer - it happens sort of
automatically between mapper and reducer. All the data gets sorted due
to grouping / partitioning needed to run reducers.

If you really mean some more generic "divide and conquer" problem,
then, I guess, Hadoop is not a best choice for parallelizing it. You
might want to solve it using several sequential Hadoop jobs, but it's
not very effective and it's pretty limited in terms of scalability.

-- 
WBR, Mikhail Yakshin

Mime
View raw message