hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Re: How do I trigger multiple Mapper tasks?
Date Mon, 18 Jan 2010 17:27:19 GMT
Thanks for the replies.  The NLineInputFormat uses JobConf which has been
deprecated so I would rather not use that class.  But I looked at the
FileInputFormat which has the following method:

    FileInputFormat.setMinInputSplitSize(job, 100);

I thought if I set InputSplitSize to 100, for every 100 lines in the input
file a Mapper would be triggered.  My input file has 500 lines, so I was
expecting to see 5 Mappers, but only one Mapper is triggered.

Please help.  Thanks.


On Sun, Jan 17, 2010 at 11:45 PM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:

>
> Changing the audience to mapreduce-user.
>
> Setting the number of map tasks (mapred.map.tasks or
> JobConf.setNumMapTasks()) does not guarantee that number of maps in the job
> will be set to that. It will only be used as a hint. Number of maps is
> decided by your InputFormat. You should implement InputFormat.getSplits() to
> define how the input should be split. The fact is "number of splits is equal
> to the number of maps".
> If you are using default InputFormat (i.e. TextInputFormat), number of maps
> is decided by DFS block size. If you use NLineInputFormat with
> mapred.line.input.format.linespermap=1, number of maps will be number of
> lines in the file.
> More details @
>
> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks%28int%29
>
> Thanks
> Amareshwari
> On 1/18/10 12:51 PM, "Something Something" <mailinglists19@gmail.com>
> wrote:
>
> Hello,
>
> I read the documentation about running multiple Mapper tasks, but I can't
> get multiple Mappers to work.  I am running under EC2 with 10 nodes.
>
> Here's what I know:
>
> 1)   I guess, by default, No. of Mapper tasks will be decided by DFS block
> size, but I would like to override that.  My file is small, but each line
> triggers fairly long running complicated calculations that should be run in
> parallel.
>
> 2)  I tried setting the following property in the mapred-site.xml (only on
> Master), but that doesn't seem to help:
>
> <property>
>  <name>mapred.map.tasks</name>
>  <value>10</value>
> </property>
>
> I still see the following message:
>
> 10/01/18 01:56:34 INFO mapred.JobClient:     Launched map tasks=1
> 10/01/18 01:56:34 INFO mapred.JobClient:     Data-local map tasks=1
>
> (Also, I know for fact that multiple mappers are not running!)
>
>
> 3) I read somewhere that JobConf has a method called setNumMapTasks, but
> this class has been deprecated, and as such I am not using.  Besides this
> method just provides a hint to Hadoop, I heard.
>
> So how do I trigger multiple Mapper tasks?  Please let me know.  Thanks.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message