hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kaveh minooie <ka...@plutoz.com>
Subject Re: mapred.map.tasks and mapred.reduce.tasks parameter meaning
Date Thu, 23 Feb 2012 19:46:21 GMT

On 02/22/2012 03:38 AM, sangroya wrote:
> Hello,
> Could someone please help me to understand these configuration parameters in
> depth.
> mapred.map.tasks and mapred.reduce.tasks
> It is mentioned that default value of these parameters is 2 and 1.
> *What does it mean?*
> Does it mean 2 maps and 1 reduce per node.
> Does it mean 2 maps and 1 reduce in total (for the cluster). Or
> Does it mean 2 maps and 1 reduce per Job.

it is the suggested number of map and reduce tasks that each job would 
create if there is no other factor affecting the situation. in my 
experience they are useful for setting the minimum number of tasks that 
you want each job to have.
> Can we change maps and reduce for default example Jobs such as Wordcount
> etc. too?

you can of course change the default value in your nutch-site.xml file, 
but if you want to specify that individually for each job then you have 
to instead set the them on the command line when you are running the job:
> At the same time, I believe that total number of maps are dependent upon
> input data size?

yes the ultimate factor is number of input files (not their size)

so for example the situation that I myself had trouble figuring out was 
that I wanted different number of map tasks for fetch jobs and I found 
out that the best way to indicate that is using the -numFetchers switch 
with the generate command

Kaveh Minooie


View raw message