hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gang Luo <lgpub...@yahoo.com.cn>
Subject Re: do all mappers finish before reducer starts
Date Fri, 29 Jan 2010 22:27:01 GMT
It seems this is a hot issue

When any mapper finishes (the sorted intermediate result is on local disk), the shuffle start
to transfer the result to corresponding reducers, even other mappers are still working.  For
the shuffle is part of the reduce phase, the map phase and reduce phase could be seen overlap
to some extend. That is why you see such a progress report. 

What you actually mentioned is the reduce function. Yes, any reduce function call should be
after all the mappers have done their work. 

 -Gang


----- 原始邮件 ----
发件人: adeelmahmood <adeelmahmood@gmail.com>
收件人: core-user@hadoop.apache.org
发送日期: 2010/1/29 (周五) 4:10:50 下午
主   题: do all mappers finish before reducer starts


I just have a conceptual question. My understanding is that all the mappers
have to complete their job for the reducers to start working because mappers
dont know about each other so we need values for a given key from all the
different mappers so we have to wait until all mappers have collectively
given the system all possible values for a key .so that then that can be
passed on the reducer .. 
but when I ran these jobs .. almost everytime before the mappers are all
done the reducers start working .. so it would say map 60% reduce 30% .. how
does this works
Does it finds all possibly values for a single key from all mappers .. pass
that on the reducer and then works on other keys
any help is appreciated
-- 
View this message in context: http://old.nabble.com/do-all-mappers-finish-before-reducer-starts-tp27330927p27330927.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


      ___________________________________________________________ 
  好玩贺卡等你发,邮箱贺卡全新上线! 
http://card.mail.cn.yahoo.com/

Mime
View raw message