hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ray Chiang <rchi...@cloudera.com>
Subject Re: YARN CapacityScheduler stuck trying to fulfill reservation
Date Fri, 15 Apr 2016 21:55:58 GMT
You could also try setting the mapreduce.job.reduce.slowstart.completedmaps
property to 1 so that the reducers don't start until all the maps are
complete.

-Ray


On Thu, Apr 14, 2016 at 5:39 PM, Wangda Tan <wheeleast@gmail.com> wrote:

> It seems you hit MAPREDUCE-6302.
>
> Patch it yourself or waiting for release of 2.7.3 should solve your
> problem.
>
> On Wed, Apr 13, 2016 at 11:27 AM, Joseph Naegele <
> jnaegele@grierforensics.com> wrote:
>
>> I'm using Hadoop 2.7.1.
>>
>> I'm running on MR job on 9 nodes. Everything was working smoothly until it
>> reached (map 99%, reduce 10%). Here's the relevant lines from my
>> ResourceManager logs:
>>
>> 2016-04-13 14:19:07,930 INFO
>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySch
>> eduler: Trying to fulfill reservation for application
>> application_1460557956992_0002 on node: ip-10-0-3-14:36536
>> 2016-04-13 14:19:07,930 INFO
>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Reserved container  application=application_1460557956992_0002
>> resource=<memory:5000, vCores:1> queue=default: capacity=1.0,
>> absoluteCapacity=1.0, usedResources=<memory:47500, vCores:10>,
>> usedCapacity=0.8590133, absoluteUsedCapacity=0.8590133, numApps=1,
>> numContainers=10 usedCapacity=0.8590133 absoluteUsedCapacity=0.8590133
>> used=<memory:47500, vCores:10> cluster=<memory:55296, vCores:27>
>> 2016-04-13 14:19:07,930 INFO
>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySch
>> eduler: Skipping scheduling since node ip-10-0-3-14:36536 is reserved by
>> application appattempt_1460557956992_0002_000001
>>
>> Those three lines have repeated for the past hour and the MR job has not
>> progressed. The node in question (ip-10-0-3-14) is running the
>> ApplicationManager. From what I can tell, it looks like I'm at capacity
>> and
>> the scheduler got itself stuck unable to allocate the next needed
>> container,
>> although my understanding is pretty limited.
>>
>> Here's the resource sections of my mapred-site.xml and yarn-site.xml:
>>
>> yarn-site.xml:
>> <property>
>>   <name>yarn.nodemanager.resource.memory-mb</name>
>>   <value>6144</value>
>> </property>
>> <property>
>>   <name>yarn.nodemanager.resource.cpu-vcores</name>
>>   <value>3</value>
>> </property>
>> <property>
>>   <name>yarn.scheduler.minimum-allocation-mb</name>
>>   <value>2500</value>
>> </property>
>> <property>
>>   <name>yarn.scheduler.minimum-allocation-vcores</name>
>>   <value>1</value>
>> </property>
>>
>> mapred-site.xml:
>> <property>
>>   <name>mapreduce.map.memory.mb</name>
>>   <value>3000</value>
>> </property>
>> <property>
>>   <name>mapreduce.map.cpu.vcores</name>
>>   <value>1</value>
>> </property>
>> <property>
>>   <name>mapreduce.reduce.memory.mb</name>
>>   <value>3000</value>
>> </property>
>> <property>
>>   <name>mapreduce.reduce.cpu.vcores</name>
>>   <value>2</value>
>> </property>
>> <property>
>>   <name>mapreduce.map.java.opts</name>
>>   <value>-Xmx896m</value>
>> </property>
>> <property>
>>   <name>mapreduce.reduce.java.opts</name>
>>   <value>-Xmx1536m</value>
>> </property>
>>
>> Any ideas as to what's going on, or how to prevent this?
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: user-help@hadoop.apache.org
>>
>>
>

Mime
View raw message