hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1407) Failed tasks not killing job
Date Tue, 22 May 2007 05:46:16 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arun C Murthy updated HADOOP-1407:
----------------------------------

    Status: Patch Available  (was: Open)

Marking this patch as ready to go - HADOOP-1411 address the other issue of the 'AlreadyBeingCreatedException'
seen during the run.

> Failed tasks not killing job
> ----------------------------
>
>                 Key: HADOOP-1407
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1407
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Nigel Daley
>         Assigned To: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1407_1_20070522.patch
>
>
> Some test runs on May 10 and then since May 14 contain failed tasks (all 4 executions
fail) but the job does not fail.  Given these dates, my suspicion is that this could be related
to HADOOP-1350 or turning on speculative execution in my testing.
> Some JobTracker log snippets for tip_0005_m_002705:
> 2007-05-10 21:21:16,820 INFO org.apache.hadoop.mapred.JobInProgress: Choosing cached
task tip_0005_m_002705
> 2007-05-10 21:21:16,820 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0005_m_002705_0'
to tip tip_0005_m_002705, for tracker 'tracker_2982.com:50050'
> ...
> 2007-05-10 21:22:12,542 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0005_m_002705_0:
java.io.FileNotFoundException: /e/c/k/hadoopqa/dfs/data500/tmp/client-5285738463038775723
(No such file or directory)
>         at java.io.FileInputStream.open(Native Method)
>         at java.io.FileInputStream.<init>(FileInputStream.java:106)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1322)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1415)
>         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:48)
>         at org.apache.hadoop.fs.FSDataOutputStream$Buffer.close(FSDataOutputStream.java:72)
>         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:92)
>         at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.close(ChecksumFileSystem.java:414)
>         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:48)
>         at org.apache.hadoop.fs.FSDataOutputStream$Buffer.close(FSDataOutputStream.java:72)
>         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:92)
>         at org.apache.hadoop.fs.TestDFSIO$WriteMapper.doIO(TestDFSIO.java:207)
>         at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:123)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:187)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1709)
> ...
> 2007-05-10 21:22:15,700 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0005_m_002705_0'
has been lost.
> 2007-05-10 21:22:15,702 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task
'task_0005_m_002705_0' from 'tracker_2982.com:50050'
> 2007-05-10 21:22:15,710 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal
task tip_0005_m_002705
> 2007-05-10 21:22:15,710 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0005_m_002705_1'
to tip tip_0005_m_002705, for tracker 'tracker_2617.com:50050'
> ...
> 2007-05-10 21:22:22,665 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0005_m_002705_1:
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException:
failed to create file /benchmarks/TestDFSIO/io_data/test_io_3705 for DFSClient_task_0005_m_002705_1
on client 72.30.50.13, because this file is already being created by DFSClient_task_0005_m_002705_0
on 1.2.3.4
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:606)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:294)
>         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:341)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:573)
>         at org.apache.hadoop.ipc.Client.call(Client.java:471)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
>         at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>         at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateNewBlock(DFSClient.java:1172)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1114)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1321)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.flush(DFSClient.java:1273)
>         at java.io.FilterOutputStream.flush(FilterOutputStream.java:123)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:106)
>         at java.io.FilterOutputStream.flush(FilterOutputStream.java:123)
>         at java.io.FilterOutputStream.flush(FilterOutputStream.java:123)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:106)
>         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:91)
>         at org.apache.hadoop.fs.TestDFSIO$WriteMapper.doIO(TestDFSIO.java:207)
>         at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:123)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:187)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1709)
> ...
> 2007-05-10 21:22:23,604 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0005_m_002705_1'
has been lost.
> 2007-05-10 21:22:23,605 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task
'task_0005_m_002705_1' from 'tracker_2617.com:50050'
> 2007-05-10 21:22:23,607 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal
task tip_0005_m_002705
> 2007-05-10 21:22:23,608 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0005_m_002705_2'
to tip tip_0005_m_002705, for tracker 'tracker_2552.com:50050'
> ...
> 2007-05-10 21:22:29,280 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0005_m_002705_2:
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException:
failed to create file /benchmarks/TestDFSIO/io_data/test_io_3705 for DFSClient_task_0005_m_002705_2
on client 72.30.52.8, because this file is already being created by DFSClient_task_0005_m_002705_0
on1.2.3.4
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:606)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:294)
>         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:341)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:573)
>         at org.apache.hadoop.ipc.Client.call(Client.java:471)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
>         at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>         at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateNewBlock(DFSClient.java:1172)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1114)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1321)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.flush(DFSClient.java:1273)
>         at java.io.FilterOutputStream.flush(FilterOutputStream.java:123)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:106)
>         at java.io.FilterOutputStream.flush(FilterOutputStream.java:123)
>         at java.io.FilterOutputStream.flush(FilterOutputStream.java:123)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:106)
>         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:91)
>         at org.apache.hadoop.fs.TestDFSIO$WriteMapper.doIO(TestDFSIO.java:207)
>         at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:123)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:187)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1709)
> ...
> 2007-05-10 21:22:30,044 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0005_m_002705_2'
has been lost.
> 2007-05-10 21:22:30,045 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task
'task_0005_m_002705_2' from 'tracker_2552.com:50050'
> 2007-05-10 21:22:30,065 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal
task tip_0005_m_002705
> 2007-05-10 21:22:30,066 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0005_m_002705_3'
to tip tip_0005_m_002705, for tracker 'tracker_2618.com:50050'
> ...
> 2007-05-10 21:22:36,099 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0005_m_002705_3:
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.AlreadyBeingCreatedException:
failed to create file /benchmarks/TestDFSIO/io_data/test_io_3705 for DFSClient_task_0005_m_002705_3
on client 72.30.50.14, because this file is already being created by DFSClient_task_0005_m_002705_0
on 1.2.3.4
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:606)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:294)
>         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:341)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:573)
>         at org.apache.hadoop.ipc.Client.call(Client.java:471)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
>         at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>         at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateNewBlock(DFSClient.java:1172)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1114)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1321)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.flush(DFSClient.java:1273)
>         at java.io.FilterOutputStream.flush(FilterOutputStream.java:123)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:106)
>         at java.io.FilterOutputStream.flush(FilterOutputStream.java:123)
>         at java.io.FilterOutputStream.flush(FilterOutputStream.java:123)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:106)
>         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:91)
>         at org.apache.hadoop.fs.TestDFSIO$WriteMapper.doIO(TestDFSIO.java:207)
>         at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:123)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:187)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1709)
> 2007-05-10 21:22:36,099 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0005_m_002705_3'
has been lost.
> 2007-05-10 21:22:36,100 INFO org.apache.hadoop.mapred.TaskInProgress: TaskInProgress
tip_0005_m_002705 has failed 4 times.
> 2007-05-10 21:22:36,101 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task
'task_0005_m_002705_3' from 'tracker_2618.com:50050'
> ...
> 2007-05-10 21:28:29,456 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task
'task_0005_m_002705_2' from 'tracker_2552.com:50050'
> ...
> 2007-05-10 21:28:30,757 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task
'task_0005_m_002705_3' from 'tracker_2618.com:50050'
> ...
> 2007-05-10 21:28:33,782 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task
'task_0005_m_002705_1' from 'tracker_2617.com:50050'
> ...
> 2007-05-10 21:28:35,884 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task
'task_0005_m_002705_0' from 'tracker_2982.com:50050'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message