hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sri Ramadasu <amar...@yahoo-inc.com>
Subject Re: How do I trigger multiple Mapper tasks?
Date Mon, 18 Jan 2010 07:45:48 GMT

Changing the audience to mapreduce-user.

Setting the number of map tasks (mapred.map.tasks or JobConf.setNumMapTasks()) does not guarantee
that number of maps in the job will be set to that. It will only be used as a hint. Number
of maps is decided by your InputFormat. You should implement InputFormat.getSplits() to define
how the input should be split. The fact is "number of splits is equal to the number of maps".
If you are using default InputFormat (i.e. TextInputFormat), number of maps is decided by
DFS block size. If you use NLineInputFormat with mapred.line.input.format.linespermap=1, number
of maps will be number of lines in the file.
More details @

On 1/18/10 12:51 PM, "Something Something" <mailinglists19@gmail.com> wrote:


I read the documentation about running multiple Mapper tasks, but I can't
get multiple Mappers to work.  I am running under EC2 with 10 nodes.

Here's what I know:

1)   I guess, by default, No. of Mapper tasks will be decided by DFS block
size, but I would like to override that.  My file is small, but each line
triggers fairly long running complicated calculations that should be run in

2)  I tried setting the following property in the mapred-site.xml (only on
Master), but that doesn't seem to help:


I still see the following message:

10/01/18 01:56:34 INFO mapred.JobClient:     Launched map tasks=1
10/01/18 01:56:34 INFO mapred.JobClient:     Data-local map tasks=1

(Also, I know for fact that multiple mappers are not running!)

3) I read somewhere that JobConf has a method called setNumMapTasks, but
this class has been deprecated, and as such I am not using.  Besides this
method just provides a hint to Hadoop, I heard.

So how do I trigger multiple Mapper tasks?  Please let me know.  Thanks.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message