Return-Path: X-Original-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6C939CEB6 for ; Fri, 27 Jul 2012 01:23:37 +0000 (UTC) Received: (qmail 95369 invoked by uid 500); 27 Jul 2012 01:23:35 -0000 Delivered-To: apmail-hadoop-mapreduce-dev-archive@hadoop.apache.org Received: (qmail 95236 invoked by uid 500); 27 Jul 2012 01:23:34 -0000 Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-dev@hadoop.apache.org Delivered-To: mailing list mapreduce-dev@hadoop.apache.org Received: (qmail 95209 invoked by uid 99); 27 Jul 2012 01:23:34 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jul 2012 01:23:34 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 73638140B94 for ; Fri, 27 Jul 2012 01:23:34 +0000 (UTC) Date: Fri, 27 Jul 2012 01:23:33 +0000 (UTC) From: "George Datskos (JIRA)" To: mapreduce-dev@hadoop.apache.org Message-ID: <1466324412.108921.1343352214474.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Created] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 George Datskos created MAPREDUCE-4490: ----------------------------------------- Summary: JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security) Key: MAPREDUCE-4490 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490 Project: Hadoop Map/Reduce Issue Type: Bug Components: task-controller, tasktracker Affects Versions: 1.0.3 Reporter: George Datskos When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > 1) with more map tasks in a job than there are map slots in the cluster will result in immediate task failures for the second task in each JVM (and then the JVM exits). We have investigated this bug and the root cause is as follows. When using LinuxTaskController, the userlog directory for a task attempt (../userlogs/job/task-attempt) is created only on the first invocation (when the JVM is launched) because userlogs directories are created by the task-controller binary which only runs *once* per JVM. Therefore, attempting to create log.index is guaranteed to fail with ENOENT leading to immediate task failure and child JVM exit. {quote} 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting logging for a new task attempt_201207241401_0013_m_000027_0 in the same JVM as that of the first task /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_000006_0 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running child ENOENT: No such file or directory at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method) at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161) at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296) at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369) at org.apache.hadoop.mapred.Child.main(Child.java:229) {quote} The above error occurs in a JVM which runs tasks 6 and 27. Task6 goes smoothly. Then Task27 starts. The directory /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_0000027_0 is never created so when mapred.Child tries to write the log.index file for Task27, it fails with ENOENT because the attempt_201207241401_0013_m_0000027_0 directory does not exist. Therefore, the second task in each JVM is guaranteed to fail (and then the JVM exits) every time when using LinuxTaskController. Note that this problem does not occur when using the DefaultTaskController because the userlogs directories are created for each task (not just for each JVM as with LinuxTaskController). For each task, the TaskRunner calls the TaskController's createLogDir method before attempting to write out an index file. * DefaultTaskController#createLogDir: creates log directory for each task * LinuxTaskController#createLogDir: does nothing ** task-controller binary creates log directory [create_attempt_directories] (but only for the first task) Possible Solution: add a new command to task-controller *initialize task* to create attempt directories. Call that command, with ShellCommandExecutor, in the LinuxTaskController#createLogDir method -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira