hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Danil Serdyuchenko (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-4549) Containers stuck in KILLING state
Date Wed, 06 Jan 2016 10:40:39 GMT
Danil Serdyuchenko created YARN-4549:

             Summary: Containers stuck in KILLING state
                 Key: YARN-4549
                 URL: https://issues.apache.org/jira/browse/YARN-4549
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 2.7.1
            Reporter: Danil Serdyuchenko

We are running samza 0.8 on YARN 2.7.1 with {{LinuxContainerExecutor}} as the container-executor
with cgroups configuration. Also we have NM recovery enabled.

We observe a lot of containers that get stuck in the KIILLING state after the NM tries to
kill them. The container remains running indefinitely, this causes some duplication as new
containers are brought up to replace them. Looking through the logs NM can't seem to get the
container PID.

16/01/05 05:16:44 INFO containermanager.ContainerManagerImpl: Stopping container with container
Id: container_1448454866800_0023_01_000005
16/01/05 05:16:44 INFO nodemanager.NMAuditLogger: USER=ec2-user IP=        OPERATION=Stop
Container Request        TARGET=ContainerManageImpl      RESULT=SUCCESS  APPID=application_1448454866800_0023
16/01/05 05:16:44 INFO container.ContainerImpl: Container container_1448454866800_0023_01_000005
transitioned from RUNNING to KILLING
16/01/05 05:16:44 INFO launcher.ContainerLaunch: Cleaning up container container_1448454866800_0023_01_000005
16/01/05 05:16:47 INFO launcher.ContainerLaunch: Could not get pid for container_1448454866800_0023_01_000005.
Waited for 2000 ms.

The PID files for each container seem to be present on the node. We waren't able to consistently
replicate this and hoping that someone has come across this before.

This message was sent by Atlassian JIRA

View raw message