hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph Francis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4331) Restarting NodeManager leaves orphaned containers
Date Mon, 09 Nov 2015 14:02:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996559#comment-14996559
] 

Joseph Francis commented on YARN-4331:
--------------------------------------

[~jlowe] Setting yarn.nodemanager.recovery.enabled=true does solve the issue with orphaned
containers.
Note that the SIGKILL was only done locally to emulate few production issues we had that caused
nodemanagers to fall over.
Thanks very much for your clear explanation!

> Restarting NodeManager leaves orphaned containers
> -------------------------------------------------
>
>                 Key: YARN-4331
>                 URL: https://issues.apache.org/jira/browse/YARN-4331
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager, yarn
>    Affects Versions: 2.7.1
>            Reporter: Joseph Francis
>            Priority: Critical
>
> We are seeing a lot of orphaned containers running in our production clusters.
> I tried to simulate this locally on my machine and can replicate the issue by killing
nodemanager.
> I'm running Yarn 2.7.1 with RM state stored in zookeeper and deploying samza jobs.
> Steps:
> {quote}1. Deploy a job 
> 2. Issue a kill -9 signal to nodemanager 
> 3. We should see the AM and its container running without nodemanager
> 4. AM should die but the container still keeps running
> 5. Restarting nodemanager brings up new AM and container but leaves the orphaned container
running in the background
> {quote}
> This is effectively causing double processing of data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message