hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2391) Speculative Execution race condition with output paths
Date Thu, 07 Feb 2008 19:53:08 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566761#action_12566761
] 

Devaraj Das commented on HADOOP-2391:
-------------------------------------

bq. The tasks should throw an exception when they try to create new files, since their directory
will have been removed.

Note that due to HADOOP-2759, the tasks would end up creating the directory hierarchy implictly
without their knowledge. I think HADOOP-2759 should be fixed to address this problem without
having to complicate things to do with keeping track of speculative tasks after all the TIPs
of a job complete, etc. (as Arun and Amareshwari pointed out). But the thing that is bothering
me is - could addressing HADOOP-2759 lead to an incompatible change (and hence shouldn't be
done for 0.16.1)?

> Speculative Execution race condition with output paths
> ------------------------------------------------------
>
>                 Key: HADOOP-2391
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2391
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>         Environment: all
>            Reporter: Dennis Kubes
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.16.1
>
>         Attachments: HADOOP-2391-1-20071211.patch
>
>
> I am tracking a problem where when speculative execution is enabled, there is a race
condition when trying to read output paths from a previously completed job.  More specifically
when reduce tasks run their output is put into a working directory under the task name until
the task in completed.  The directory name is something like workdir/_taskid.  Upon completion
the output get moved into workdir.  Regular tasks are checked for this move and not considered
completed until this move is made.  I have not verified it but all indications point to speculative
tasks NOT having this same check for completion and more importantly removal when killed.
 So what we end up with when trying to read the output of previous tasks with speculative
execution enabled is the possibility that previous workdir/_taskid will be present when the
output directory is read by a chained job.  Here is an error when supports my theory:
> Generator: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot open filename
/u01/hadoop/mapred/temp/generate-temp-1197104928603/_task_200712080949_0005_r_000014_1
>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:234)
>         at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:389)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:644)
>         at org.apache.hadoop.ipc.Client.call(Client.java:507)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:186)
>         at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>         at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
>         at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:839)
>         at org.apache.hadoop.dfs.DFSClient$DFSInputStream.<init>(DFSClient.java:831)
>         at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:263)
>         at org.apache.hadoop.dfs.DistributedFileSystem.open(DistributedFileSystem.java:114)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1356)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1349)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1344)
>         at org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:87)
>         at org.apache.nutch.crawl.Generator.generate(Generator.java:429)
>         at org.apache.nutch.crawl.Generator.run(Generator.java:563)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
>         at org.apache.nutch.crawl.Generator.main(Generator.java:526)
> I will continue to research this and post as I make progress on tracking down this bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message