hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)
Date Fri, 13 Jun 2014 21:37:02 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031207#comment-14031207
] 

Eric Yang commented on MAPREDUCE-4490:
--------------------------------------

+1 looks good.

> JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)
> ---------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4490
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task-controller, tasktracker
>    Affects Versions: 0.20.205.0, 1.0.3, 1.2.1
>            Reporter: George Datskos
>            Assignee: sam liu
>            Priority: Critical
>              Labels: patch
>             Fix For: 1.2.1
>
>         Attachments: MAPREDUCE-4490.patch, MAPREDUCE-4490.patch, MAPREDUCE-4490.patch
>
>
> When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > 1) with
more map tasks in a job than there are map slots in the cluster will result in immediate task
failures for the second task in each JVM (and then the JVM exits). We have investigated this
bug and the root cause is as follows. When using LinuxTaskController, the userlog directory
for a task attempt (../userlogs/job/task-attempt) is created only on the first invocation
(when the JVM is launched) because userlogs directories are created by the task-controller
binary which only runs *once* per JVM. Therefore, attempting to create log.index is guaranteed
to fail with ENOENT leading to immediate task failure and child JVM exit.
> {quote}
> 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting logging for a
new task attempt_201207241401_0013_m_000027_0 in the same JVM as that of the first task /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_000006_0
> 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running child
> ENOENT: No such file or directory
>         at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
>         at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
>         at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296)
>         at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369)
>         at org.apache.hadoop.mapred.Child.main(Child.java:229)
> {quote}
> The above error occurs in a JVM which runs tasks 6 and 27.  Task6 goes smoothly. Then
Task27 starts. The directory /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_0000027_0
is never created so when mapred.Child tries to write the log.index file for Task27, it fails
with ENOENT because the attempt_201207241401_0013_m_0000027_0 directory does not exist. Therefore,
the second task in each JVM is guaranteed to fail (and then the JVM exits) every time when
using LinuxTaskController. Note that this problem does not occur when using the DefaultTaskController
because the userlogs directories are created for each task (not just for each JVM as with
LinuxTaskController).
> For each task, the TaskRunner calls the TaskController's createLogDir method before attempting
to write out an index file.
> * DefaultTaskController#createLogDir: creates log directory for each task
> * LinuxTaskController#createLogDir: does nothing
> ** task-controller binary creates log directory [create_attempt_directories] (but only
for the first task)
> Possible Solution: add a new command to task-controller *initialize task* to create attempt
directories.  Call that command, with ShellCommandExecutor, in the LinuxTaskController#createLogDir
method



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message