Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Message-ID: <32673087.1180600455778.JavaMail.jira@brutus>
Date: Thu, 31 May 2007 01:34:15 -0700 (PDT)
From: "dhruba borthakur (JIRA)" <jira@apache.org>
To: hadoop-dev@lucene.apache.org
Subject: [jira] Commented: (HADOOP-1396) FileNotFound exception on DFS block
In-Reply-To: <5115252.1179746476115.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500317 ] 

dhruba borthakur commented on HADOOP-1396:
------------------------------------------

The DFSClient uses a random number generator to generate the name of the temporary file where the latest block of the file-being-written-to is cached. The above problem could theoretically occur if two instances of DFSClient gets the same value from the random number generator at around the same time.

I am suspecting that "enabling speculative execution" somehow results in more number of concurrent tasks on the same node and this increase the probability of same tmp file being used concurrently by multiple tasks. Hence we see this problem more often when speculative-execution is switched on.

An alternative is to use File.createTempFile. This method will fail if the file already exists, otherwise it will be created atomically.


> FileNotFound exception on DFS block
> -----------------------------------
>
>                 Key: HADOOP-1396
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1396
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Devaraj Das
>             Fix For: 0.14.0
>
>
> Got a couple of exceptions of the form illustrated below. This was for a randomwriter run (and every node in the cluster has multiple disks).
> java.io.FileNotFoundException: /tmp/dfs/data/tmp/client-8395631522349067878 (No such file or directory)
> 	at java.io.FileInputStream.open(Native Method)
> 	at java.io.FileInputStream.(FileInputStream.java:106)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1323)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.flush(DFSClient.java:1274)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.write(DFSClient.java:1256)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38)
> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.write(ChecksumFileSystem.java:402)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38)
> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 	at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:775)
> 	at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:158)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:187)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1709)
> So it seems like the bug reported in HADOOP-758 still exists.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.