Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 12880 invoked from network); 31 May 2007 08:34:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 31 May 2007 08:34:37 -0000 Received: (qmail 19683 invoked by uid 500); 31 May 2007 08:34:40 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 19660 invoked by uid 500); 31 May 2007 08:34:40 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 19651 invoked by uid 99); 31 May 2007 08:34:40 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 May 2007 01:34:40 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 May 2007 01:34:36 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id BEC8871417F for ; Thu, 31 May 2007 01:34:15 -0700 (PDT) Message-ID: <32673087.1180600455778.JavaMail.jira@brutus> Date: Thu, 31 May 2007 01:34:15 -0700 (PDT) From: "dhruba borthakur (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1396) FileNotFound exception on DFS block In-Reply-To: <5115252.1179746476115.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500317 ] dhruba borthakur commented on HADOOP-1396: ------------------------------------------ The DFSClient uses a random number generator to generate the name of the temporary file where the latest block of the file-being-written-to is cached. The above problem could theoretically occur if two instances of DFSClient gets the same value from the random number generator at around the same time. I am suspecting that "enabling speculative execution" somehow results in more number of concurrent tasks on the same node and this increase the probability of same tmp file being used concurrently by multiple tasks. Hence we see this problem more often when speculative-execution is switched on. An alternative is to use File.createTempFile. This method will fail if the file already exists, otherwise it will be created atomically. > FileNotFound exception on DFS block > ----------------------------------- > > Key: HADOOP-1396 > URL: https://issues.apache.org/jira/browse/HADOOP-1396 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.12.3 > Reporter: Devaraj Das > Fix For: 0.14.0 > > > Got a couple of exceptions of the form illustrated below. This was for a randomwriter run (and every node in the cluster has multiple disks). > java.io.FileNotFoundException: /tmp/dfs/data/tmp/client-8395631522349067878 (No such file or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.(FileInputStream.java:106) > at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1323) > at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.flush(DFSClient.java:1274) > at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.write(DFSClient.java:1256) > at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.write(ChecksumFileSystem.java:402) > at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38) > at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:775) > at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:158) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:187) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1709) > So it seems like the bug reported in HADOOP-758 still exists. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.