hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bejoy KS" <bejoy.had...@gmail.com>
Subject Re: why is num of map tasks gets overridden?
Date Wed, 22 Aug 2012 06:03:01 GMT
Hi

There are two options I can think of now

1) If all your jobs are memory intensive I'd recommend you to adjust your task slots per node
accordingly
2) If only a few jobs are memory intensive, you can think of each map task processing lesser
volume of data. For that set mapred.max.splitsize to the maximum data chuck a map task can
process with your current memory constrain.
 
Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: nutch buddy <nutch.buddy@gmail.com>
Date: Wed, 22 Aug 2012 08:57:31 
To: <user@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: why is num of map tasks gets overridden?

So what can I do If I have a given input, and my job needs a lot of memroy
per map task?
I can't control the amount of map tasks, and my total memory per machine is
limited - I'll eventaully get each machine's memory full.

On Tue, Aug 21, 2012 at 3:52 PM, Bertrand Dechoux <dechouxb@gmail.com>wrote:

> Actually controlling the number of maps is subtle. The mapred.map.tasks
>> parameter is just a hint to the InputFormat for the number of maps. The
>> default InputFormat behavior is to split the total number of bytes into the
>> right number of fragments. However, in the default case the DFS block size
>> of the input files is treated as an upper bound for input splits. A lower
>> bound on the split size can be set via mapred.min.split.size. Thus, if you
>> expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k
>> maps, unless your mapred.map.tasks is even larger. Ultimately the
>> InputFormat<http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html>determines
the number of maps.
>>
>
> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> Bertrand
>
>
> On Tue, Aug 21, 2012 at 2:19 PM, nutch buddy <nutch.buddy@gmail.com>wrote:
>
>> I configure a job in hadoop ,set the number of map tasks in the code to 8.
>>
>> Then I run the job and it gets 152 map tasks. Can't get why its being
>> overriden and whhere it get 152 from.
>>
>> The mapred-site.xml has 24 as mapred.map.tasks.
>>
>> any idea?
>>
>
>
>
> --
> Bertrand Dechoux
>

Mime
View raw message