hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: why is num of map tasks gets overridden?
Date Tue, 21 Aug 2012 12:52:42 GMT
> Actually controlling the number of maps is subtle. The mapred.map.tasks
> parameter is just a hint to the InputFormat for the number of maps. The
> default InputFormat behavior is to split the total number of bytes into the
> right number of fragments. However, in the default case the DFS block size
> of the input files is treated as an upper bound for input splits. A lower
> bound on the split size can be set via mapred.min.split.size. Thus, if you
> expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k
> maps, unless your mapred.map.tasks is even larger. Ultimately the
> InputFormat<http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html>determines
the number of maps.



On Tue, Aug 21, 2012 at 2:19 PM, nutch buddy <nutch.buddy@gmail.com> wrote:

> I configure a job in hadoop ,set the number of map tasks in the code to 8.
> Then I run the job and it gets 152 map tasks. Can't get why its being
> overriden and whhere it get 152 from.
> The mapred-site.xml has 24 as mapred.map.tasks.
> any idea?

Bertrand Dechoux

View raw message