hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2283) RM failed to release the AM container
Date Wed, 30 Jul 2014 12:26:38 GMT

    [ https://issues.apache.org/jira/browse/YARN-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079234#comment-14079234
] 

Sunil G commented on YARN-2283:
-------------------------------

I tried to reproduce this and I found AM memory is immediately released.
Could you please try to recur this and give the exact steps?

> RM failed to release the AM container
> -------------------------------------
>
>                 Key: YARN-2283
>                 URL: https://issues.apache.org/jira/browse/YARN-2283
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>         Environment: NM1: AM running
> NM2: Map task running
> mapreduce.map.maxattempts=1
>            Reporter: Nishan Shetty
>            Priority: Critical
>
> During container stability test i faced this problem
> While job is running map task got killed
> Observe that eventhough application is FAILED MRAppMaster process is running till timeout
because RM did not release  the AM container
> {code}
> 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1405318134611_0002_01_000005 Container Transitioned from RUNNING to COMPLETED
> 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
Completed container: container_1405318134611_0002_01_000005 in state: COMPLETED event:FINISHED
> 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger:
USER=testos	OPERATION=AM Released Container	TARGET=SchedulerApp	RESULT=SUCCESS	APPID=application_1405318134611_0002
CONTAINERID=container_1405318134611_0002_01_000005
> 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
Finish information of container container_1405318134611_0002_01_000005 is written
> 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter:
Stored the finish data of container container_1405318134611_0002_01_000005
> 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
Released container container_1405318134611_0002_01_000005 of capacity <memory:1024, vCores:1>
on host HOST-10-18-40-153:45026, which currently has 1 containers, <memory:2048, vCores:1>
used and <memory:6144, vCores:7> available, release resources=true
> 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
default used=<memory:2048, vCores:1> numContainers=1 user=testos user-resources=<memory:2048,
vCores:1>
> 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
completedContainer container=Container: [ContainerId: container_1405318134611_0002_01_000005,
NodeId: HOST-10-18-40-153:45026, NodeHttpAddress: HOST-10-18-40-153:45025, Resource: <memory:1024,
vCores:1>, Priority: 5, Token: Token { kind: ContainerToken, service: 10.18.40.153:45026
}, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:2048, vCores:1>,
usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=1 cluster=<memory:8192,
vCores:8>
> 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
completedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used=<memory:2048,
vCores:1> cluster=<memory:8192, vCores:8>
> 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
Re-sorting completed queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0,
usedResources=<memory:2048, vCores:1>, usedCapacity=0.25, absoluteUsedCapacity=0.25,
numApps=1, numContainers=1
> 2014-07-14 14:43:33,899 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Application attempt appattempt_1405318134611_0002_000001 released container container_1405318134611_0002_01_000005
on node: host: HOST-10-18-40-153:45026 #containers=1 available=6144 used=2048 with event:
FINISHED
> 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Updating application attempt appattempt_1405318134611_0002_000001 with final state: FINISHING
> 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1405318134611_0002_000001 State change from RUNNING to FINAL_SAVING
> 2014-07-14 14:43:34,924 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
Updating application application_1405318134611_0002 with final state: FINISHING
> 2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
Watcher event type: NodeDataChanged with state:SyncConnected for path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002/appattempt_1405318134611_0002_000001
for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
STARTED
> 2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1405318134611_0002 State change from RUNNING to FINAL_SAVING
> 2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
Storing info for app: application_1405318134611_0002
> 2014-07-14 14:43:34,947 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1405318134611_0002_000001 State change from FINAL_SAVING to FINISHING
> 2014-07-14 14:43:35,012 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
Watcher event type: NodeDataChanged with state:SyncConnected for path:/rmstore/ZKRMStateRoot/RMAppRoot/application_1405318134611_0002
for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
STARTED
> 2014-07-14 14:43:35,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
application_1405318134611_0002 State change from FINAL_SAVING to FINISHING
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message