hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-389) Infinitely assigning containers when the required resource exceeds the cluster's absolute capacity
Date Fri, 08 Feb 2013 19:25:12 GMT

     [ https://issues.apache.org/jira/browse/YARN-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli updated YARN-389:
-----------------------------------------

    Description: I've run wordcount example on branch-2 and trunk. I've set yarn.nodemanager.resource.memory-mb
to 1G and yarn.app.mapreduce.am.resource.mb to 1.5G. Therefore, resourcemanager is to assign
a 2G AM container for AM. However, the nodemanager doesn't have enough memory to assign the
container. The problem is that the assignment operation will be repeated infinitely, if the
assignment cannot be accomplished. Logs follow.  (was: I've run wordcount example on branch-2
and trunk. I've set yarn.nodemanager.resource.memory-mb to 1G and yarn.app.mapreduce.am.resource.mb
to 1.5G. Therefore, resourcemanager is to assign a 2G AM container for AM. However, the nodemanager
doesn't have enough memory to assign the container. The problem is that the assignment operation
will be repeated infinitely, if the assignment cannot be accomplished. See the following log.

{code}
2013-02-07 19:05:05,947 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService:
Allocated new applicationId: 1
2013-02-07 19:05:06,477 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService:
Storing Application with id application_1360292699925_0001
2013-02-07 19:05:06,479 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
Storing info for app: application_1360292699925_0001
2013-02-07 19:05:06,479 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService:
Application with id 1 submitted by user zshen
2013-02-07 19:05:06,481 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=zshen	IP=127.0.0.1	OPERATION=Submit Application Request	TARGET=ClientRMService	RESULT=SUCCESS
APPID=application_1360292699925_0001
2013-02-07 19:05:06,493 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1360292699925_0001 State change from NEW to SUBMITTED
2013-02-07 19:05:06,494 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
Registering appattempt_1360292699925_0001_000001
2013-02-07 19:05:06,495 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1360292699925_0001_000001 State change from NEW to SUBMITTED
2013-02-07 19:05:06,506 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Application application_1360292699925_0001 from user: zshen activated in queue: default
2013-02-07 19:05:06,506 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Application added - appId: application_1360292699925_0001 user: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$User@4965d0e0,
leaf-queue: default #user-pending-applications: 0 #user-active-applications: 1 #queue-pending-applications:
0 #queue-active-applications: 1
2013-02-07 19:05:06,506 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Application added - appId: application_1360292699925_0001 user: zshen leaf-queue of parent:
root #applications: 1
2013-02-07 19:05:06,506 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Application Submission: appattempt_1360292699925_0001_000001, user: zshen queue: default:
capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0,
absoluteUsedCapacity=0.0, numApps=1, numContainers=0, currently active: 1
2013-02-07 19:05:06,508 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1360292699925_0001_000001 State change from SUBMITTED to SCHEDULED
2013-02-07 19:05:06,509 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1360292699925_0001 State change from SUBMITTED to ACCEPTED
2013-02-07 19:05:07,163 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
2013-02-07 19:05:08,164 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
2013-02-07 19:05:09,167 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
2013-02-07 19:05:10,168 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
2013-02-07 19:05:11,170 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
2013-02-07 19:05:12,173 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
2013-02-07 19:05:13,175 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
2013-02-07 19:05:14,177 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
2013-02-07 19:05:15,179 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
...
2013-02-07 23:51:02,976 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
2013-02-07 23:51:03,977 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
2013-02-07 23:51:04,978 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
2013-02-07 23:51:05,979 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
2013-02-07 23:51:06,981 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
2013-02-07 23:51:07,982 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
2013-02-07 23:51:08,983 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default usedResources: <memory:0, vCores:0> clusterResources: <memory:1024, vCores:16>
currentCapacity 0.0 required <memory:2048, vCores:1> potentialNewCapacity: 2.0 (  max-capacity:
1.0)
...
{code}

In my opinion, the attempt of assigning containers should be terminated in the following two
cases.
1. Required > Cluster's absolute capacity: the assignment is impossible to be accomplished.
The assignment should be failed immediately.
2. Required + Already used > Cluster's absolute capacity: the assignment should be failed
after a certain number of rounds of assignment attempt or a certain duration. The number of
rounds or the duration length should be configurable.
)

Zhijie, a meta comment: It's better to post the problem only in description (and summary)
and provide solutions and logs as a followup. Logs can be in follow up comments or attached
as separate files.


                
> Infinitely assigning containers when the required resource exceeds the cluster's absolute
capacity
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-389
>                 URL: https://issues.apache.org/jira/browse/YARN-389
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>
> I've run wordcount example on branch-2 and trunk. I've set yarn.nodemanager.resource.memory-mb
to 1G and yarn.app.mapreduce.am.resource.mb to 1.5G. Therefore, resourcemanager is to assign
a 2G AM container for AM. However, the nodemanager doesn't have enough memory to assign the
container. The problem is that the assignment operation will be repeated infinitely, if the
assignment cannot be accomplished. Logs follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message