hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Roberts (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup
Date Wed, 05 Nov 2014 16:09:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198560#comment-14198560

Nathan Roberts commented on YARN-2809:

Stack trace:
[<ffffffff8150d4a8>] ? panic+0xa7/0x16f
 [<ffffffff815116d4>] ? oops_end+0xe4/0x100
 [<ffffffff81046bfb>] ? no_context+0xfb/0x260
 [<ffffffff81449058>] ? dev_hard_start_xmit+0x308/0x530
 [<ffffffff81046e85>] ? __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff812773a9>] ? cpumask_next_and+0x29/0x50
 [<ffffffff81046f53>] ? bad_area_nosemaphore+0x13/0x20
 [<ffffffff810476b1>] ? __do_page_fault+0x321/0x480
 [<ffffffff81056881>] ? update_curr+0xe1/0x1f0
 [<ffffffff81065905>] ? enqueue_entity+0x125/0x410
 [<ffffffff810524e3>] ? set_next_buddy+0x43/0x50
 [<ffffffff810570e0>] ? check_preempt_wakeup+0x1c0/0x260
 [<ffffffff81065ceb>] ? enqueue_task_fair+0xfb/0x100
 [<ffffffff8105230c>] ? check_preempt_curr+0x7c/0x90
 [<ffffffff815135fe>] ? do_page_fault+0x3e/0xa0
 [<ffffffff815109b5>] ? page_fault+0x25/0x30
 [<ffffffff81056b19>] ? update_cfs_shares+0x29/0x170
 [<ffffffff81065363>] ? dequeue_entity+0x113/0x2e0
 [<ffffffff810664da>] ? dequeue_task_fair+0x6a/0x130
 [<ffffffff81055ebe>] ? dequeue_task+0x8e/0xb0
 [<ffffffff81055f03>] ? deactivate_task+0x23/0x30
 [<ffffffff8150dc99>] ? thread_return+0x127/0x76e
 [<ffffffff810e6e1e>] ? call_rcu+0xe/0x10
 [<ffffffff8107196f>] ? release_task+0x33f/0x4b0
 [<ffffffff81073837>] ? do_exit+0x5b7/0x870
 [<ffffffff81073b48>] ? do_group_exit+0x58/0xd0
 [<ffffffff81088e36>] ? get_signal_to_deliver+0x1f6/0x460
 [<ffffffff8100a265>] ? do_signal+0x75/0x800
 [<ffffffff810dc675>] ? __audit_syscall_exit+0x265/0x290
 [<ffffffff8100aa80>] ? do_notify_resume+0x90/0xc0
 [<ffffffff8100b341>] ? int_signal+0x12/0x17
What's happening is that CgroupsLCEResourcesHandler is attempting to delete the cgroup before
all the tasks within the cgroup have exited (explained later). It tries every 20ms to remove
the cgroup until successful, or a timeout (default 1 second) expires. Sometimes these attempts
hit a race within the kernel where the last task has not completely finished tearing down,
yet it is far enough down that the cgroup is able to be removed. This leaves a NULL pointer
around which results in the panic.

The kernel has been fixed and most recent distributions will have the fix. However, there
are older kernel versions out there that would benefit from a simple workaround. The proposed
workaround is to wait until the "tasks" file within the cgroup is empty, and then delay a
small amount of time before attempting to delete the cgroup. 

One question is why are there still tasks in the cgroup? Don't have a complete answer here
and some of the details may be slightly off, but do know the following: The processtree within
a mapreduce  cgroup looks like "bash -c" -> "java ..." 
When map or reduce processing is complete, the AM is informed, who then informs the NM so
that the container can be torn down. A SIGTERM is sent to the session (bash is session leader).
bash is much quicker at exiting than everything else so it exits and its parent (container-executor)
gets a SIGCHILD and starts cleaning up, this includes removing the cgroup which gets us into
the race described above. 

> Implement workaround for linux kernel panic when removing cgroup
> ----------------------------------------------------------------
>                 Key: YARN-2809
>                 URL: https://issues.apache.org/jira/browse/YARN-2809
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.6.0
>         Environment:  RHEL 6.4
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
> Some older versions of linux have a bug that can cause a kernel panic when the LCE attempts
to remove a cgroup. It is a race condition so it's a bit rare but on a few thousand node cluster
it can result in a couple of panics per day.
> This is the commit that likely (haven't verified) fixes the problem in linux: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.y&id=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267
> Details will be added in comments.

This message was sent by Atlassian JIRA

View raw message