Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 13245 invoked from network); 8 Sep 2008 21:10:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Sep 2008 21:10:38 -0000 Received: (qmail 85778 invoked by uid 500); 8 Sep 2008 21:10:33 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 85751 invoked by uid 500); 8 Sep 2008 21:10:33 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 85740 invoked by uid 99); 8 Sep 2008 21:10:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Sep 2008 14:10:32 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Sep 2008 21:09:42 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 8AA8B234C1DA for ; Mon, 8 Sep 2008 14:09:44 -0700 (PDT) Message-ID: <750792219.1220908184567.JavaMail.jira@brutus> Date: Mon, 8 Sep 2008 14:09:44 -0700 (PDT) From: "Chris Douglas (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4115) Reducer gets stuck in shuffle when local disk out of space In-Reply-To: <1354779430.1220890304758.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629293#action_12629293 ] Chris Douglas commented on HADOOP-4115: --------------------------------------- Is there space available on other drives on that TT that aren't being used, or are all configured drives completely out of space? Does the reduce eventually fail and get rescheduled or does it hang? In the latter case, is the task ever rescheduled/speculated or does this state persist until the job is killed? In the former case, is it being rescheduled on the same node, ultimately and incorrectly failing the job, or does the job eventually succeed? Quick aside: it would help a lot if the issue description were to present an abstract of the observed behavior; stack traces and other verbose diagnostic information is more readable (especially by email) in a comment. > Reducer gets stuck in shuffle when local disk out of space > ---------------------------------------------------------- > > Key: HADOOP-4115 > URL: https://issues.apache.org/jira/browse/HADOOP-4115 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Affects Versions: 0.17.2 > Reporter: Marco Nicosia > Priority: Critical > > 2008-08-29 23:53:12,357 WARN org.apache.hadoop.mapred.ReduceTask: task_200808291851_0001_r_000245_0 Merging of the local FS files threw an exception: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device > at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199) > at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) > at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:339) > at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155) > at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132) > at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121) > at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112) > at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86) > at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at org.apache.hadoop.io.SequenceFile$UncompressedBytes.writeUncompressedBytes(SequenceFile.java:617) > at org.apache.hadoop.io.SequenceFile$Writer.appendRaw(SequenceFile.java:1038) > at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java:2626) > at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:1564) > Caused by: java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:260) > at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:197) > ... 16 more > 2008-08-29 23:53:14,013 WARN org.apache.hadoop.mapred.TaskTracker: Error running child > java.io.IOException: task_200808291851_0001_r_000245_0The reduce copier failed > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:329) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.