hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4115) Reducer gets stuck in shuffle when local disk out of space
Date Mon, 08 Sep 2008 21:09:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629293#action_12629293

Chris Douglas commented on HADOOP-4115:

Is there space available on other drives on that TT that aren't being used, or are all configured
drives completely out of space? Does the reduce eventually fail and get rescheduled or does
it hang? In the latter case, is the task ever rescheduled/speculated or does this state persist
until the job is killed? In the former case, is it being rescheduled on the same node, ultimately
and incorrectly failing the job, or does the job eventually succeed?

Quick aside: it would help a lot if the issue description were to present an abstract of the
observed behavior; stack traces and other verbose diagnostic information is more readable
(especially by email) in a comment.

> Reducer gets stuck in shuffle when local disk out of space
> ----------------------------------------------------------
>                 Key: HADOOP-4115
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4115
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.2
>            Reporter: Marco Nicosia
>            Priority: Critical
> 2008-08-29 23:53:12,357 WARN org.apache.hadoop.mapred.ReduceTask: task_200808291851_0001_r_000245_0
Merging of the local FS files threw an exception: org.apache.hadoop.fs.FSError: java.io.IOException:
No space left on device
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:339)
> 	at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
> 	at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
> 	at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
> 	at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
> 	at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 	at org.apache.hadoop.io.SequenceFile$UncompressedBytes.writeUncompressedBytes(SequenceFile.java:617)
> 	at org.apache.hadoop.io.SequenceFile$Writer.appendRaw(SequenceFile.java:1038)
> 	at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java:2626)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:1564)
> Caused by: java.io.IOException: No space left on device
> 	at java.io.FileOutputStream.writeBytes(Native Method)
> 	at java.io.FileOutputStream.write(FileOutputStream.java:260)
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:197)
> 	... 16 more
> 2008-08-29 23:53:14,013 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
> java.io.IOException: task_200808291851_0001_r_000245_0The reduce copier failed
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:329)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message