hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alec Ten Harmsel <a...@alectenharmsel.com>
Subject Re: Hadoop YARM Cluster Setup Questions
Date Sat, 23 Aug 2014 18:08:25 GMT
On Sat 23 Aug 2014 01:52:38 PM EDT, S.L wrote:
> Thats what I thought too, but please check the Answer #2 here in this
> question , I am facing a similar problem.
>
> http://stackoverflow.com/questions/12135949/why-map-task-always-running-on-a-single-node

We were having the same problem; a map with 50 tasks would result in 
all 50 on a single datanode (our datanodes have 64GB of memory). What I 
did to fix it is change the following configuration values in 
mapred-site.xml:

    mapreduce.map.memory.mb
    mapreduce.map.java.opts
    mapreduce.reduce.memory.mb
    mapreduce.reduce.java.opts

These control the amount of memory used for maps and reduces; our 
machines have 12 cores, so we wanted ~16-20 tasks per node instead of 
the current 63 per node since "mapreduce.map.memory.mb" is by default 
1024 as far as I know. If you set these values appropriately (memory in 
box / tasks per node), you should be good to go. Also, each of the 
"java.opts" should be "-Xmx##M", where ## should be the memory for the 
JVM in MB.

Both mapreduce.map.memory.mb and mapreduce.reduce.memory.mb are 3072 in 
our installation, resulting in around 20 tasks per node.

Please note that I'm not sure if this is the "official" solution, but I 
could not find a better solution since the old way of assigning a 
certain number of maps per node was deprecated. Also, as mentioned 
earlier in this thread, you do need to have enough input splits before 
tasks will be assigned to multiple nodes.

Hope this helps,

Alec

Mime
View raw message