hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject RE: why reduce task can be scheduled before map tasks are 100% completed?
Date Mon, 03 Aug 2009 04:52:27 GMT
And the combiner runs while fetching the outputs right?

-----Original Message-----
From: Arun C Murthy [mailto:acm@yahoo-inc.com] 
Sent: Monday, August 03, 2009 9:27 AM
To: mapreduce-dev@hadoop.apache.org
Cc: common-dev@hadoop.apache.org
Subject: Re: why reduce task can be scheduled before map tasks are 100% completed?

That check ensures sufficient #maps are completed before any of the  
reduces for the job are started.

The reduces 'shuffle' map outputs from completed maps, but don't get  
into the 'reduce' phase until all map-outputs are copied over.

Arun

PS: Moving this to mapreduce-dev@

On Aug 2, 2009, at 5:02 AM, 我的Gmail邮箱 wrote:

> Hi, everyone.
> In class org.apache.hadoop.mapred.JobInProgress, there is a public  
> method:
> scheduleReduces(), it will return true if "finishedMapTasks >=
> completedMapsForReduceSlowstart"
> and then the scheduler can schedule a new reduce task for a given
> taskTracker.
>
> but as I konw, reduce can not be started unitl map is 100%  
> completed. Does
> anyone can explain it? thanks a lot.

Mime
View raw message