hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph Naegele" <jnaeg...@grierforensics.com>
Subject YARN CapacityScheduler stuck trying to fulfill reservation
Date Wed, 13 Apr 2016 18:27:14 GMT
I'm using Hadoop 2.7.1.

I'm running on MR job on 9 nodes. Everything was working smoothly until it
reached (map 99%, reduce 10%). Here's the relevant lines from my
ResourceManager logs:

2016-04-13 14:19:07,930 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySch
eduler: Trying to fulfill reservation for application
application_1460557956992_0002 on node: ip-10-0-3-14:36536
2016-04-13 14:19:07,930 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Reserved container  application=application_1460557956992_0002
resource=<memory:5000, vCores:1> queue=default: capacity=1.0,
absoluteCapacity=1.0, usedResources=<memory:47500, vCores:10>,
usedCapacity=0.8590133, absoluteUsedCapacity=0.8590133, numApps=1,
numContainers=10 usedCapacity=0.8590133 absoluteUsedCapacity=0.8590133
used=<memory:47500, vCores:10> cluster=<memory:55296, vCores:27>
2016-04-13 14:19:07,930 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySch
eduler: Skipping scheduling since node ip-10-0-3-14:36536 is reserved by
application appattempt_1460557956992_0002_000001

Those three lines have repeated for the past hour and the MR job has not
progressed. The node in question (ip-10-0-3-14) is running the
ApplicationManager. From what I can tell, it looks like I'm at capacity and
the scheduler got itself stuck unable to allocate the next needed container,
although my understanding is pretty limited.

Here's the resource sections of my mapred-site.xml and yarn-site.xml:

yarn-site.xml:
<property>
  <name>yarn.nodemanager.resource.memory-mb</name>
  <value>6144</value>
</property>
<property>
  <name>yarn.nodemanager.resource.cpu-vcores</name>
  <value>3</value>
</property>
<property>
  <name>yarn.scheduler.minimum-allocation-mb</name>
  <value>2500</value>
</property>
<property>
  <name>yarn.scheduler.minimum-allocation-vcores</name>
  <value>1</value>
</property>

mapred-site.xml:
<property>
  <name>mapreduce.map.memory.mb</name>
  <value>3000</value>
</property>
<property>
  <name>mapreduce.map.cpu.vcores</name>
  <value>1</value>
</property>
<property>
  <name>mapreduce.reduce.memory.mb</name>
  <value>3000</value>
</property>
<property>
  <name>mapreduce.reduce.cpu.vcores</name>
  <value>2</value>
</property>
<property>
  <name>mapreduce.map.java.opts</name>
  <value>-Xmx896m</value>
</property>
<property>
  <name>mapreduce.reduce.java.opts</name>
  <value>-Xmx1536m</value>
</property>

Any ideas as to what's going on, or how to prevent this?


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Mime
View raw message