hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: why is num of map tasks gets overridden?
Date Thu, 23 Aug 2012 11:30:37 GMT
Hi

You can adjust the slots in a TaskTracker/Node using
map slots -> mapred.tasktracker.map.tasks.maximum
reduce slots -> mapred.tasktracker.reduce.tasks.maximum

It is a property at Task Tracker level, so you cannot override it on a job.
You need to edit each TT's mapred-site.xml (I believe you need
to restart TT as well).

Regards
Bejoy KS

On Thu, Aug 23, 2012 at 4:42 PM, nutch buddy <nutch.buddy@gmail.com> wrote:

> how do I adjust number of slots per node?
> and also,  is the parameter maperd.tasktracker.map.tasks.maximum relevant
> here?
>
> thanks
>
>
> On Wed, Aug 22, 2012 at 9:23 AM, Bertrand Dechoux <dechouxb@gmail.com>wrote:
>
>> 3) Similarly to 2, you could consider multithreading. So in each physical
>> node you would only to have the equivalent in memory of what is required
>> for a map while having the processing power of many. But it will depend on
>> your context ie how you are using the memory.
>>
>> But 1) is really the key indeed : <number of slots per physical node> *
>> <maximum memory per slot> shouldn't be superior to what is available in
>> your physical node.
>>
>>  Regards
>>
>> Bertrand
>>
>>
>> On Wed, Aug 22, 2012 at 8:03 AM, Bejoy KS <bejoy.hadoop@gmail.com> wrote:
>>
>>> **
>>> Hi
>>>
>>> There are two options I can think of now
>>>
>>> 1) If all your jobs are memory intensive I'd recommend you to adjust
>>> your task slots per node accordingly
>>> 2) If only a few jobs are memory intensive, you can think of each map
>>> task processing lesser volume of data. For that set mapred.max.splitsize to
>>> the maximum data chuck a map task can process with your current memory
>>> constrain.
>>> Regards
>>> Bejoy KS
>>>
>>> Sent from handheld, please excuse typos.
>>> ------------------------------
>>> *From: * nutch buddy <nutch.buddy@gmail.com>
>>> *Date: *Wed, 22 Aug 2012 08:57:31 +0300
>>> *To: *<user@hadoop.apache.org>
>>> *ReplyTo: * user@hadoop.apache.org
>>> *Subject: *Re: why is num of map tasks gets overridden?
>>>
>>> So what can I do If I have a given input, and my job needs a lot of
>>> memroy per map task?
>>> I can't control the amount of map tasks, and my total memory per machine
>>> is limited - I'll eventaully get each machine's memory full.
>>>
>>> On Tue, Aug 21, 2012 at 3:52 PM, Bertrand Dechoux <dechouxb@gmail.com>wrote:
>>>
>>>> Actually controlling the number of maps is subtle. The mapred.map.tasks
>>>>> parameter is just a hint to the InputFormat for the number of maps. The
>>>>> default InputFormat behavior is to split the total number of bytes into
the
>>>>> right number of fragments. However, in the default case the DFS block
size
>>>>> of the input files is treated as an upper bound for input splits. A lower
>>>>> bound on the split size can be set via mapred.min.split.size. Thus, if
you
>>>>> expect 10TB of input data and have 128MB DFS blocks, you'll end up with
82k
>>>>> maps, unless your mapred.map.tasks is even larger. Ultimately the
>>>>> InputFormat<http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html>determines
the number of maps.
>>>>>
>>>>
>>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>>
>>>> Bertrand
>>>>
>>>>
>>>> On Tue, Aug 21, 2012 at 2:19 PM, nutch buddy <nutch.buddy@gmail.com>wrote:
>>>>
>>>>> I configure a job in hadoop ,set the number of map tasks in the code
>>>>> to 8.
>>>>>
>>>>> Then I run the job and it gets 152 map tasks. Can't get why its being
>>>>> overriden and whhere it get 152 from.
>>>>>
>>>>> The mapred-site.xml has 24 as mapred.map.tasks.
>>>>>
>>>>> any idea?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Bertrand Dechoux
>>>>
>>>
>>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>

Mime
View raw message