hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HADOOP-1098) output blocks lost when speculative execution is used
Date Mon, 07 May 2007 21:18:15 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Owen O'Malley resolved HADOOP-1098.
-----------------------------------

    Resolution: Duplicate

This was fixed by HADOOP-1127.

> output blocks lost when speculative execution is used
> -----------------------------------------------------
>
>                 Key: HADOOP-1098
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1098
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Nigel Daley
>            Priority: Critical
>             Fix For: 0.13.0
>
>
> The Sort benchmark completes successfully for me on the latest trunk (0.12.1 candidate)
with speculation turned on.  Validation of the Sort benchmark output, however, is failing.
 I see one sort output file (part-00375) that is way smaller than all the rest.  In fact,
it is exactly 1 block long.
> dfs ls output:
> ...
> /user/hadoopqa/sortBenchmark100/output/part-00373       <r 3>   2971688212
> /user/hadoopqa/sortBenchmark100/output/part-00374       <r 3>   2973451660
> /user/hadoopqa/sortBenchmark100/output/part-00375       <r 3>   134217728
> /user/hadoopqa/sortBenchmark100/output/part-00376       <r 3>   2972933208
> /user/hadoopqa/sortBenchmark100/output/part-00377       <r 3>   2972309956
> ...
> During the Sort Benchmark, I see 9 AlreadyBeingCreatedExceptions in the NameNode log
for this file (and more of these exceptions for other files too).  I also include here the
1 PendingReplicationMonitor WARN message from the NameNode log in case it's relevant:
> ...
> 2007-03-08 21:56:31,747 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor
timed out block blk_-849195508701590166
> ...
> 2007-03-08 22:04:35,471 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.startFile:
failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1
on client 72.30.38.16 because pendingCreates is non-null.
> 2007-03-08 22:04:35,476 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020
call error: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375
for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375
for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:701)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:283)
>         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
> ...
> During sort validation, I get this exception in the JobTracker log:
> 2007-03-08 22:51:32,017 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0003_m_001379_0:
java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>         at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:57)
>         at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:91)
>         at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1525)
>         at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1436)
>         at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1482)
>         at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:72)
>         at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
> I also saw this in the DataNode log during sort validation, but it could be unrelated:
> 2007-03-09 01:04:41,323 WARN org.apache.hadoop.dfs.DataNode: java.io.IOException: Unexpected
error trying to delete block blk_-5047673597270588432. Block not found in blockMap.
> 	at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:596)
> 	at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:429)
> 	at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1053)
> 	at java.lang.Thread.run(Thread.java:619)
> Since the default for 0.12.1 will be for speculative execution to be turned off, I am
assigning this to 0.13.0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message