hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mingjiang Shi <m...@gopivotal.com>
Subject Re: number of map tasks on yarn
Date Wed, 02 Apr 2014 02:54:34 GMT
+1 for Wangda's comment.

My 2 cents:
There are 2 aspect of the problem:
1. How many maps task in a job.
2. How many map tasks can be run concurrently.

For #1, see Wangda's comments.
For #2, it depends on the cluster resource.  In your case, the cluster will
only be able to run 24 map tasks concurrently at most.



On Wed, Apr 2, 2014 at 10:45 AM, Wangda Tan <wheeleast@gmail.com> wrote:

> More specifically, Number of map tasks for each job is depended on
> InputFormat.getSplits(...). The number of map tasks is as same as number of
> splits returned by InputFormat.getSplits(...). You can read source code of
> FileInputFormat to get more understanding about this.
>
>
>
> Regards,
> Wangda Tan
>
>
> On Wed, Apr 2, 2014 at 10:23 AM, Stanley Shi <sshi@gopivotal.com> wrote:
>
>> map task number is not decided by the resources you need.
>> It's decided by something else.
>>
>> Regards,
>> *Stanley Shi,*
>>
>>
>>
>> On Wed, Apr 2, 2014 at 9:08 AM, Libo Yu <yu_libo@hotmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I pretty much use the default yarn setting to run a word count example on
>>> a 3 node cluster. Here are my settings:
>>> yarn.nodemanager.resource.memory-mb 8192
>>> yarn.scheduler.minimum-allocation-mb 1024
>>> yarn.scheduler.maximum-allocation-vcores 32
>>>
>>> I would expect to see 8192/1024 * 3 = 24 map tasks.
>>> However, I see 32 map tasks. Anybody knows why? Thanks.
>>>
>>> Libo
>>>
>>>
>>
>


-- 
Cheers
-MJ

Mime
View raw message