hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pig" <tjuhzjem...@qq.com>
Subject Re: When a Reduce Task starts?
Date Fri, 24 Dec 2010 05:20:00 GMT
excellent answer.

For some special reduce jobs that do not rely on the order of (key,value) pairs,  the sort
phase is of no use.
In this situation, theoretically speaking, reduce can be started before all of the map task
But why hadoop doesn't support this feature? For example, it may be specified as an argument
when committing a job.

------------------ Original ------------------
From:  "Harsh J"<qwertymaniac@gmail.com>;
Date:  Tue, Dec 21, 2010 03:13 PM
To:  "mapreduce-user"<mapreduce-user@hadoop.apache.org>; 
Subject:  Re: When a Reduce Task starts?

On Tue, Dec 21, 2010 at 7:23 AM, li ping <li.j2ee@gmail.com> wrote:
> I think the reduce can be started before all of the map finished.
> See the configration item in mapred-site.xml
> <property>
> ??<name>mapred.reduce.slowstart.completed.maps</name>
> ??<value>0.05</value>
> ??<description>Fraction of the number of maps in the job which should be
> ??complete before reduces are scheduled for the job.
> ??</description>
> </property>
> Correct me, if I'm wrong.

Well it depends on what you mean by a "reduce". A ReduceTask, in
Hadoop terms, may begin as some maps complete (as configured using
mapred.reduce.slowstart.completed.maps) -- but they would only be in
the Copy phase (Not sort/reduce).

With the current Hadoop implementation, a reduce(Key, Iterable<Value>)
will never be called until all mappers have completed.

Harsh J
View raw message