It's part of the design that reduce() does not get called until the map phase is complete. You're seeing reduce report as started when map is at 90% complete because hadoop is shuffling data from the mappers that have completed. As currently designed, you can't prematurely start reduce() because there is no way to gaurantee you have all the values for any key until all the mappers are done. reduce() requires a key and all the values for that key in order to execute. Jeff On Tue, Jan 4, 2011 at 10:53 AM, sagar naik wrote: > Hi All, > > number of map task: 1000s > number of reduce task:single digit > > In such cases the reduce task wont started even when few map task are > completed. > Example: > In my observation of a sample run of bin/hadoop jar > hadoop-*examples*.jar pi 10000 10, the reduce did not start untill 90% > of map task were complete. > > The only reason, I can think of not starting a reduce task is to > avoid the un-necessary transfer of map output data in case of > failures. > > > Is there a way to quickly start the reduce task in such case ? > Wht is the configuration param to change this behavior > > > > Thanks, > Sagar >