hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeyendran Balakrishnan" <jbalakrish...@docomolabs-usa.com>
Subject RE: do all mappers finish before reducer starts
Date Mon, 01 Feb 2010 22:59:16 GMT
Correct me if I'm wrong, but this:

>> Yes, any reduce function call should be after all the mappers have done
>> their work.

is strictly true only if speculative execution is explicitly turned off. Otherwise there is
a chance that some reduce tasks can actually start before all the maps are complete. In case
it turns out that some map output key used by one speculative reduce task is output by some
other map after this reduce task has started, I think the JT then kills this speculative task.

-----Original Message-----
From: Gang Luo [mailto:lgpublic@yahoo.com.cn] 
Sent: Friday, January 29, 2010 2:27 PM
To: common-user@hadoop.apache.org
Subject: Re: do all mappers finish before reducer starts

It seems this is a hot issue

When any mapper finishes (the sorted intermediate result is on local disk), the shuffle start
to transfer the result to corresponding reducers, even other mappers are still working.  For
the shuffle is part of the reduce phase, the map phase and reduce phase could be seen overlap
to some extend. That is why you see such a progress report. 

What you actually mentioned is the reduce function. Yes, any reduce function call should be
after all the mappers have done their work. 


----- 原始邮件 ----
发件人: adeelmahmood <adeelmahmood@gmail.com>
收件人: core-user@hadoop.apache.org
发送日期: 2010/1/29 (周五) 4:10:50 下午
主   题: do all mappers finish before reducer starts

I just have a conceptual question. My understanding is that all the mappers
have to complete their job for the reducers to start working because mappers
dont know about each other so we need values for a given key from all the
different mappers so we have to wait until all mappers have collectively
given the system all possible values for a key .so that then that can be
passed on the reducer .. 
but when I ran these jobs .. almost everytime before the mappers are all
done the reducers start working .. so it would say map 60% reduce 30% .. how
does this works
Does it finds all possibly values for a single key from all mappers .. pass
that on the reducer and then works on other keys
any help is appreciated
View this message in context: http://old.nabble.com/do-all-mappers-finish-before-reducer-starts-tp27330927p27330927.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


View raw message