hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: number of mappers allowed in a container in hadoop2
Date Wed, 15 Oct 2014 14:37:20 GMT
No. In Yarn, a container is a container which you can call normal
container. It is the M/R framework that you run on top of it that considers
them 'map' and 'reduce' containers.

I recommend reading the documentation about architecture and design of yarn
at the link below. It will answer lot of your questions.

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

Regards,
Shahab

On Wed, Oct 15, 2014 at 10:18 AM, SACHINGUPTA <sachin@datametica.com> wrote:

>  thanks for the reply
>
> i have one more doubt
>
> are there three kinds of containers with different memory sizes in hadoop 2
>
> 1.normal container
> 2.map task container
> 3.reduce task container
>
> On Wednesday 15 October 2014 07:33 PM, Shahab Yunus wrote:
>
> The data that the each map task will process is different from the memory
> the task itself might require depending upon whatever processing that you
> plan to do in the task.
>
>  Very trivial example: Let us say your map gets 128mb input data but your
> task logic is such that it creates lots of String objects and ArrayList
> objects, then wouldn't your memory requirement for the task be greater than
> your input data?
>
>  I think you are confusing the size of the input data to the map/task
> with the actual memory required by the map/task itself to do its work.
>
>  Regards,
> Shahab
>
> On Wed, Oct 15, 2014 at 9:44 AM, SACHINGUPTA <sachin@datametica.com>
> wrote:
>
>>  it is still not clear to me
>> lets suppose block size of my hdfs is 128 mb so every mapper will process
>> only 128 mb of data
>> then what is the meaning of setting the property mapreduce.map.memory.mb
>> that is already known from the block size then why this property
>>
>>
>>
>> On Wednesday 15 October 2014 07:06 PM, Shahab Yunus wrote:
>>
>> Explanation here.
>>
>>
>> http://stackoverflow.com/questions/24070557/what-is-the-relation-between-mapreduce-map-memory-mb-and-mapred-map-child-jav
>>
>> https://support.pivotal.io/hc/en-us/articles/201462036-Mapreduce-YARN-Memory-Parameters
>>
>> http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/ClusterSetup.html
>> (scroll towards the end.)
>>
>>  Regards,
>> Shahab
>>
>> On Wed, Oct 15, 2014 at 9:24 AM, SACHINGUPTA <sachin@datametica.com>
>> wrote:
>>
>>>  I have one more doubt i was reading this
>>>
>>>
>>> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
>>>
>>> there is one property as
>>>
>>>   mapreduce.map.memory.mb  = 2*1024 MB
>>> mapreduce.reduce.memory.mb           = 2 * 2 = 4*1024 MB
>>> what are these properties mapreduce.map.memory.mb and
>>> mapreduce.reduce.memory.mb
>>>
>>> On Wednesday 15 October 2014 06:17 PM, Shahab Yunus wrote:
>>>
>>> It cannot run more mappers (tasks) in parallel than the underlying cores
>>> available. Just like it cannot run multiple mappers in parallel if each
>>> mapper's (task's) memory requirements are greater than allocated and
>>> available container size configured on each node.
>>>
>>>  The links that I provided earlier...see the following section in that
>>> one:
>>> Section:"Configuring YARN"
>>>
>>>  Also this:
>>> http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/
>>> Section "1. YARN Concurrency (aka “What Happened to Slots?”)"
>>>
>>>  This should help in putting things in perspective regarding how
>>> resource allocation for each task, container and resources available on the
>>> node relate to each other.
>>>
>>>  Regards,
>>> Shahab
>>>
>>> On Wed, Oct 15, 2014 at 8:18 AM, SACHINGUPTA <sachin@datametica.com>
>>> wrote:
>>>
>>>>  but Shahab if i have only 4 core machine then how yarn can run more
>>>> then 4 mappers in parallel
>>>> On Wednesday 15 October 2014 05:45 PM, Shahab Yunus wrote:
>>>>
>>>> It depends on memory settings as well, that how much you want to assign
>>>> resources to each container. Then yarn will run as many mappers in parallel
>>>> as possible.
>>>>
>>>>  See this:
>>>> http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
>>>>
>>>> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
>>>>
>>>>  Regards,
>>>> Shahab
>>>>
>>>> On Wed, Oct 15, 2014 at 8:09 AM, SACHINGUPTA <sachin@datametica.com>
>>>> wrote:
>>>>
>>>>> Hi guys
>>>>>
>>>>> I have situation in which i have machine with 4 processor and i have
5
>>>>> containers so does it mean i can have only 4 mappers running parallely
at a
>>>>> time
>>>>>
>>>>> and number of mappers is not dependent on the number of containers in
>>>>> a machine then what is the use of container concept
>>>>>
>>>>> sorry if i have asked anything obvious.
>>>>>
>>>>> --
>>>>> Thanks
>>>>> Sachin Gupta
>>>>>
>>>>>
>>>>
>>>> --
>>>> Thanks
>>>> Sachin Gupta
>>>>
>>>>
>>>
>>> --
>>> Thanks
>>> Sachin Gupta
>>>
>>>
>>
>>   --
>> Thanks
>> Sachin Gupta
>>
>>
>
> --
> Thanks
> Sachin Gupta
>
>

Mime
View raw message