Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 49124 invoked from network); 26 Oct 2006 13:26:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Oct 2006 13:26:25 -0000 Received: (qmail 87007 invoked by uid 500); 26 Oct 2006 13:26:35 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 86981 invoked by uid 500); 26 Oct 2006 13:26:35 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 86972 invoked by uid 99); 26 Oct 2006 13:26:35 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Oct 2006 06:26:35 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Oct 2006 06:26:21 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 2D4A27142CB for ; Thu, 26 Oct 2006 06:25:18 -0700 (PDT) Message-ID: <5963070.1161869118181.JavaMail.root@brutus> Date: Thu, 26 Oct 2006 06:25:18 -0700 (PDT) From: "Johan Oskarson (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Created: (HADOOP-643) failure closing block of file MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org failure closing block of file ----------------------------- Key: HADOOP-643 URL: http://issues.apache.org/jira/browse/HADOOP-643 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.7.2 Reporter: Johan Oskarson Priority: Critical I've been getting "failure closing block of file" on random files. Both datanode and tasktracker running on node7. No problems with pinging. Guess it got stuck after the NPE in DataNode. Job cannot start because of: java.io.IOException: failure closing block of file /home/hadoop/mapred/system/submit_99u9cd/.job.jar.crc to node node7:50010 at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.internalClose(DFSClient.java:1199) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1163) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1241) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at org.apache.hadoop.fs.FSDataOutputStream$Summer.close(FSDataOutputStream.java:96) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:205) at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:190) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:77) at org.apache.hadoop.dfs.DistributedFileSystem.copyFromLocalFile(DistributedFileSystem.java:186) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:289) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:314) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:248) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:280) at java.lang.Thread.run(Thread.java:595) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:313) at java.io.DataInputStream.readFully(DataInputStream.java:176) at java.io.DataInputStream.readLong(DataInputStream.java:380) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.internalClose(DFSClient.java:1193) Cxception in datanode.out on node7: Exception in thread "org.apache.hadoop.dfs.DataNode$DataXceiveServer@1c86be5" java.lang.NullPointerException at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:162) at org.apache.hadoop.dfs.FSDataset$FSDir.checkDirTree(FSDataset.java:162) at org.apache.hadoop.dfs.FSDataset$FSVolume.checkDirs(FSDataset.java:238) at org.apache.hadoop.dfs.FSDataset$FSVolumeSet.checkDirs(FSDataset.java:326) at org.apache.hadoop.dfs.FSDataset.checkDataDir(FSDataset.java:522) at org.apache.hadoop.dfs.DataNode$DataXceiveServer.run(DataNode.java:480) at java.lang.Thread.run(Thread.java:595) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira