hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: reducer tasks start time issue
Date Sat, 22 Dec 2012 16:15:18 GMT
A reduce can't process the complete data set until it has fetched all
partitions. And any map may produce a partition for any reducer.
Hence, we generally wait before all maps have terminated, and their
partition outputs ready and copied over to reduces, before we begin to
group and process the keys.

However, given that you began thinking about this, this paper on
"Online" Hadoop may interest you:
http://www.neilconway.org/docs/nsdi2010_hop.pdf

On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <linlma@gmail.com> wrote:
> Hi guys,
>
> Supposing in a Hadoop job, there are both mappers and reducers. My question
> is, reducer tasks cannot begin until all mapper tasks complete? If so, why
> designed in this way?
>
> thanks in advance,
> Lin



-- 
Harsh J

Mime
View raw message