hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup
Date Tue, 10 Feb 2015 17:35:12 GMT

    [ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314500#comment-14314500
] 

Hudson commented on YARN-2809:
------------------------------

FAILURE: Integrated in Hadoop-trunk-Commit #7063 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7063/])
YARN-2809. Implement workaround for linux kernel panic when removing cgroup. Contributed by
Nathan Roberts (jlowe: rev 3f5431a22fcef7e3eb9aceeefe324e5b7ac84049)
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


> Implement workaround for linux kernel panic when removing cgroup
> ----------------------------------------------------------------
>
>                 Key: YARN-2809
>                 URL: https://issues.apache.org/jira/browse/YARN-2809
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.6.0
>         Environment:  RHEL 6.4
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
>             Fix For: 2.7.0
>
>         Attachments: YARN-2809-v2.patch, YARN-2809-v3.patch, YARN-2809.patch
>
>
> Some older versions of linux have a bug that can cause a kernel panic when the LCE attempts
to remove a cgroup. It is a race condition so it's a bit rare but on a few thousand node cluster
it can result in a couple of panics per day.
> This is the commit that likely (haven't verified) fixes the problem in linux: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.y&id=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267
> Details will be added in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message