hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5198) Race condition in cleanup during task tracker renint with LinuxTaskController
Date Wed, 01 May 2013 00:32:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646214#comment-13646214
] 

Arpit Gupta commented on MAPREDUCE-5198:
----------------------------------------

[~acmurthy]

Thanks for the suggestion. Patch attached.

Also made changes to LinuxTaskController.java to log when there is a failure in delete.

We will now see logs like

{code}
id/4/hdp/mapred/local,/grid/5/hdp/mapred/local
2013-04-30 23:05:37,033 INFO org.apache.hadoop.mapred.LinuxTaskController: deleteAsUser: [/usr/lib/hadoop/libexec/../bin/task-controller,
hrt_qa, /grid/0/hdp/mapred/local,/grid/1/hdp/mapred/local,/grid/2/hdp/mapred/local,/grid/3/hdp/mapred/local,/grid/4/hdp/mapred/local,/grid/5/hdp/mapred/local,
3, ]
2013-04-30 23:05:37,033 WARN org.apache.hadoop.mapred.LinuxTaskController: Exit code from
task is : 255
2013-04-30 23:05:37,033 INFO org.apache.hadoop.mapred.LinuxTaskController: Output from deleteAsUser
LinuxTaskController:
2013-04-30 23:05:37,033 INFO org.apache.hadoop.mapred.TaskController: Reading task controller
config from /etc/hadoop/conf.empty/taskcontroller.cfg
2013-04-30 23:05:37,033 INFO org.apache.hadoop.mapred.TaskController: main : command provided
3
2013-04-30 23:05:37,033 INFO org.apache.hadoop.mapred.TaskController: main : user is hrt_qa
2013-04-30 23:05:37,033 INFO org.apache.hadoop.mapred.TaskController: Good mapred-local-dirs
are /grid/0/hdp/mapred/local,/grid/1/hdp/mapred/local,/grid/2/hdp/mapred/local,/grid/3/hdp/mapred/local,/grid/4/hdp/mapred/local,/grid/5/hdp/mapred/local
2013-04-30 23:05:37,033 INFO org.apache.hadoop.mapred.TaskController: Unreadable directory
/grid/4/hdp/mapred/local/taskTracker/hrt_qa/jobcache/job_201304302257_0002/attempt_201304302257_0002_m_000022_0.
Skipping..
2013-04-30 23:05:37,033 INFO org.apache.hadoop.mapred.TaskController: Couldn't delete directory
/grid/5/hdp/mapred/local/taskTracker/hrt_qa/jobcache/job_201304302257_0002/attempt_201304302257_0002_m_000022_0/output
- No such file or directory
2013-04-30 23:05:37,034 INFO org.apache.hadoop.mapred.TaskController: rmdir of /grid/5/hdp/mapred/local/taskTracker/hrt_qa/
failed - Directory not empty
2013-04-30 23:05:37,034 ERROR org.apache.hadoop.mapred.TaskTracker: Got fatal exception while
reinitializing TaskTracker: org.apache.hadoop.util.Shell$ExitCodeException:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
        at org.apache.hadoop.util.Shell.run(Shell.java:182)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
        at org.apache.hadoop.mapred.LinuxTaskController.deleteAsUser(LinuxTaskController.java:281)
        at org.apache.hadoop.mapred.TaskTracker.deleteUserDirectories(TaskTracker.java:779)
        at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:816)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2704)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3934)
{code}
                
> Race condition in cleanup during task tracker renint with LinuxTaskController
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5198
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5198
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 1.2.0
>            Reporter: Arpit Gupta
>            Assignee: Arpit Gupta
>         Attachments: MAPREDUCE-5198.patch
>
>
> This was noticed when job tracker would be restarted while jobs were running and would
ask the task tracker to reinitialize. 
> Tasktracker would fail with an error like
> {code}
> 013-04-27 20:19:09,627 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories
are: /grid/0/hdp/mapred/local,/grid/1/hdp/mapred/local,/grid/2/hdp/mapred/local,/grid/3/hdp/mapred/local,/grid/4/hdp/mapred/local,/grid/5/hdp/mapred/local
> 2013-04-27 20:19:09,628 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 42075
caught: java.nio.channels.ClosedChannelException
> 	at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
> 	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
> 	at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1717)
> 	at org.apache.hadoop.ipc.Server.access$2000(Server.java:98)
> 	at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:744)
> 	at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:808)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1433)
> 2013-04-27 20:19:09,628 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 42075:
exiting
> 2013-04-27 20:19:10,414 ERROR org.apache.hadoop.mapred.TaskTracker: Got fatal exception
while reinitializing TaskTracker: org.apache.hadoop.util.Shell$ExitCodeException: 
> 	at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
> 	at org.apache.hadoop.util.Shell.run(Shell.java:182)
> 	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
> 	at org.apache.hadoop.mapred.LinuxTaskController.deleteAsUser(LinuxTaskController.java:281)
> 	at org.apache.hadoop.mapred.TaskTracker.deleteUserDirectories(TaskTracker.java:779)
> 	at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:816)
> 	at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2704)
> 	at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3934)
> {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message