hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anubhav Dhoot (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-4046) NM container recovery is broken on some linux distro because of syntax of signal
Date Tue, 11 Aug 2015 18:41:45 GMT
Anubhav Dhoot created YARN-4046:
-----------------------------------

             Summary: NM container recovery is broken on some linux distro because of syntax
of signal
                 Key: YARN-4046
                 URL: https://issues.apache.org/jira/browse/YARN-4046
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager
            Reporter: Anubhav Dhoot
            Assignee: Anubhav Dhoot
            Priority: Critical


On a debian machine we have seen node manager recovery of containers fail because the signal
syntax for process group may not work. We see errors in checking if process is alive during
container recovery which causes the container to be declared as LOST (154) on a NodeManager
restart.

The application will fail with error
{noformat}
Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM
restartAM Container for appattempt_1439244348718_0001_000001 exited with exitCode: 154
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message