hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Rodeghiero <paolo....@gmail.com>
Subject Re: reducing mappers for a job
Date Thu, 17 Nov 2011 10:42:24 GMT
Il 17/11/2011 05:00, He Chen ha scritto:
> Hi Jay Vyas
>
> Ke yuan's method may decrease the number of mapper because in default
>
> the number of mapper for a job = the number of blocks in this job's input
> file.
>

Hi,
I'm not in production phase,  so I just reference things that I read.
First, may be obvious, but remember that in case you are using multiple 
input files, a minimum of a map task is assigned for each file.

So to minimize the number of map task you can:
- aggregate data input in a single file , maybe in a splittable sequence 
file (using SequenceFile class)
- increase HDFS block size and input split size which are controlled by 
different properties (dfs.block.size mapred.max.split.size and 
mapred.min.split.size)

Note that locality can be low decrease if mapred.min.split > 
dfs.block.size, because you are forcing to take as a split more than a 
block.
In the end, mapred.*.split.size behave in slightly different way on 
which API and which FileInputFormat subclass are you using.

Refecences
----
Tom White, Hadoop: the Definitive Guite (Second Edition) pag. 116-120, 
202-203


Mime
View raw message