Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4BB0EC201 for ; Tue, 18 Jun 2013 10:56:27 +0000 (UTC) Received: (qmail 4690 invoked by uid 500); 18 Jun 2013 10:56:26 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 4656 invoked by uid 500); 18 Jun 2013 10:56:25 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 4586 invoked by uid 99); 18 Jun 2013 10:56:23 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Jun 2013 10:56:23 +0000 Date: Tue, 18 Jun 2013 10:56:23 +0000 (UTC) From: "Hudson (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-799) CgroupsLCEResourcesHandler tries to write to cgroup.procs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686574#comment-13686574 ] Hudson commented on YARN-799: ----------------------------- Integrated in Hadoop-Yarn-trunk #244 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/244/]) YARN-799. Fix CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs. Contributed by Chris Riccomini. (Revision 1494035) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1494035 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java > CgroupsLCEResourcesHandler tries to write to cgroup.procs > --------------------------------------------------------- > > Key: YARN-799 > URL: https://issues.apache.org/jira/browse/YARN-799 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.0.4-alpha, 2.0.5-alpha > Reporter: Chris Riccomini > Assignee: Chris Riccomini > Fix For: 2.1.0-beta > > Attachments: YARN-799.0.patch > > > The implementation of > bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java > Tells the container-executor to write PIDs to cgroup.procs: > {code} > public String getResourcesOption(ContainerId containerId) { > String containerName = containerId.toString(); > StringBuilder sb = new StringBuilder("cgroups="); > if (isCpuWeightEnabled()) { > sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + "/cgroup.procs"); > sb.append(","); > } > if (sb.charAt(sb.length() - 1) == ',') { > sb.deleteCharAt(sb.length() - 1); > } > return sb.toString(); > } > {code} > Apparently, this file has not always been writeable: > https://patchwork.kernel.org/patch/116146/ > http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html > https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html > The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. > {quote} > $ uname -a > Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux > {quote} > As a result, when the container-executor tries to run, it fails with this error message: > bq. fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n", > This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: > {quote} > $ pwd > /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001 > $ ls -l > total 0 > -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs > -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us > -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us > -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares > -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release > -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks > {quote} > I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. > I can think of several potential resolutions to this ticket: > 1. Ignore the problem, and make people patch YARN when they hit this issue. > 2. Write to /tasks instead of /cgroup.procs for everyone > 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. > 4. Add a config to yarn-site that lets admins specify which file to write to. > Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira