hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kawa <kawa.a...@gmail.com>
Subject Re: Relationship between heap sizes and mapred.child.java.opt configuration
Date Thu, 28 Nov 2013 22:54:17 GMT
> Thanks for the reply. So what is the purpose of heap sizes for
> tasktrackers and datanodes then?
>

TaskTrackers and DataNodes are long-living daemons (written in Java)
running on slave nodes in separate JVMs. I usually give at least 1GB to
each of them in production clusters.


> In other words, if I want to speed up the map/reducing cycle, can I just
> minimize the heap size and maximize the "mapred.child.java.opts?" or will
> the minimizing heap sizes causing out of memory exception?
>

The higher memory in mapred.child.java.opts is, the less frequently your
tasks are spilling key-value pairs to disks (so they are a bit efficient -
read also about configuration property called "io.sort.mb"). However, the
higher memory in mapred.child.java.opts is, the fewer tasks you can run on
your slave node. It is a some kind of trade-off.

If you do not tune "mapred.child.java.opts" correctly, then you might get
Out-Of-Memory error (if your job consumes more memory than
mapred.child.java.opts allows). If you run too many tasks on your slave
node, and you exceeded the amount of physical memory available in a node,
then

1) you can start swapping (in Hadoop, heavy swapping usually means that
node is super slow and it often becomes considered "dead"),
2) or the kernel Out-Of-Memory Killer can start killing your TaskTrackers
and tasks started by it.

You can read about issues with swapping and OOM killer on my blog post:
http://hakunamapdata.com/two-memory-related-issues-on-the-apache-hadoop-cluster/


>
>
> On Mon, Nov 25, 2013 at 10:02 AM, Kai Voigt <k@123.org> wrote:
>
>> mapred.child.java.opts are referring to the settings for the JVMs spawned
>> by the TaskTracker. This JVMs will actually run the tasks (mappers and
>> reducers)
>>
>> The heap sizes for TaskTrackers and DataNodes are unrelated to those.
>> They run in their own JVMs each.
>>
>> Kai
>>
>> Am 25.11.2013 um 15:52 schrieb Chih-Hsien Wu <chjasonwu@gmail.com>:
>>
>> I'm learning about Hadoop configuration. What is the connection between
>> the datanode/ tasktracker heap sizes and the "mapre.child.java.opts"?  Does
>> one have to be exceeded to another?
>>
>>
>>   ------------------------------
>> *Kai Voigt* Am Germaniahafen 1 k@123.org
>>  24143 Kiel +49 160 96683050
>>  Germany @KaiVoigt
>>
>>
>

Mime
View raw message