Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5FD7D17CAF for ; Wed, 5 Nov 2014 16:09:34 +0000 (UTC) Received: (qmail 77065 invoked by uid 500); 5 Nov 2014 16:09:34 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 77020 invoked by uid 500); 5 Nov 2014 16:09:34 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 77008 invoked by uid 99); 5 Nov 2014 16:09:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Nov 2014 16:09:34 +0000 Date: Wed, 5 Nov 2014 16:09:34 +0000 (UTC) From: "Nathan Roberts (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198560#comment-14198560 ] Nathan Roberts commented on YARN-2809: -------------------------------------- Stack trace: {noformat} [] ? panic+0xa7/0x16f [] ? oops_end+0xe4/0x100 [] ? no_context+0xfb/0x260 [] ? dev_hard_start_xmit+0x308/0x530 [] ? __bad_area_nosemaphore+0x125/0x1e0 [] ? cpumask_next_and+0x29/0x50 [] ? bad_area_nosemaphore+0x13/0x20 [] ? __do_page_fault+0x321/0x480 [] ? update_curr+0xe1/0x1f0 [] ? enqueue_entity+0x125/0x410 [] ? set_next_buddy+0x43/0x50 [] ? check_preempt_wakeup+0x1c0/0x260 [] ? enqueue_task_fair+0xfb/0x100 [] ? check_preempt_curr+0x7c/0x90 [] ? do_page_fault+0x3e/0xa0 [] ? page_fault+0x25/0x30 [] ? update_cfs_shares+0x29/0x170 [] ? dequeue_entity+0x113/0x2e0 [] ? dequeue_task_fair+0x6a/0x130 [] ? dequeue_task+0x8e/0xb0 [] ? deactivate_task+0x23/0x30 [] ? thread_return+0x127/0x76e [] ? call_rcu+0xe/0x10 [] ? release_task+0x33f/0x4b0 [] ? do_exit+0x5b7/0x870 [] ? do_group_exit+0x58/0xd0 [] ? get_signal_to_deliver+0x1f6/0x460 [] ? do_signal+0x75/0x800 [] ? __audit_syscall_exit+0x265/0x290 [] ? do_notify_resume+0x90/0xc0 [] ? int_signal+0x12/0x17 {noformat} What's happening is that CgroupsLCEResourcesHandler is attempting to delete the cgroup before all the tasks within the cgroup have exited (explained later). It tries every 20ms to remove the cgroup until successful, or a timeout (default 1 second) expires. Sometimes these attempts hit a race within the kernel where the last task has not completely finished tearing down, yet it is far enough down that the cgroup is able to be removed. This leaves a NULL pointer around which results in the panic. The kernel has been fixed and most recent distributions will have the fix. However, there are older kernel versions out there that would benefit from a simple workaround. The proposed workaround is to wait until the "tasks" file within the cgroup is empty, and then delay a small amount of time before attempting to delete the cgroup. One question is why are there still tasks in the cgroup? Don't have a complete answer here and some of the details may be slightly off, but do know the following: The processtree within a mapreduce cgroup looks like "bash -c" -> "java ..." When map or reduce processing is complete, the AM is informed, who then informs the NM so that the container can be torn down. A SIGTERM is sent to the session (bash is session leader). bash is much quicker at exiting than everything else so it exits and its parent (container-executor) gets a SIGCHILD and starts cleaning up, this includes removing the cgroup which gets us into the race described above. > Implement workaround for linux kernel panic when removing cgroup > ---------------------------------------------------------------- > > Key: YARN-2809 > URL: https://issues.apache.org/jira/browse/YARN-2809 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.6.0 > Environment: RHEL 6.4 > Reporter: Nathan Roberts > Assignee: Nathan Roberts > > Some older versions of linux have a bug that can cause a kernel panic when the LCE attempts to remove a cgroup. It is a race condition so it's a bit rare but on a few thousand node cluster it can result in a couple of panics per day. > This is the commit that likely (haven't verified) fixes the problem in linux: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.y&id=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267 > Details will be added in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)