Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 88526 invoked from network); 22 Jun 2007 06:10:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Jun 2007 06:10:47 -0000 Received: (qmail 36699 invoked by uid 500); 22 Jun 2007 06:10:50 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 36666 invoked by uid 500); 22 Jun 2007 06:10:50 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 36657 invoked by uid 99); 22 Jun 2007 06:10:50 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jun 2007 23:10:50 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jun 2007 23:10:46 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 0CFDF7141ED for ; Thu, 21 Jun 2007 23:10:26 -0700 (PDT) Message-ID: <33305457.1182492626050.JavaMail.jira@brutus> Date: Thu, 21 Jun 2007 23:10:26 -0700 (PDT) From: "dhruba borthakur (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1513) A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class In-Reply-To: <30436065.1182414505961.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507135 ] dhruba borthakur commented on HADOOP-1513: ------------------------------------------ Two mkdirs() cannot interleave. If you look at FSNamesystem.mkdirsInternal(), it is synchronized with the global FSNamesystem lock. I belive that the scenario Devaraj explained "cannot happen". > A likely race condition between the creation of a directory and checking for its existence in the DiskChecker class > ------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-1513 > URL: https://issues.apache.org/jira/browse/HADOOP-1513 > Project: Hadoop > Issue Type: Bug > Components: fs > Affects Versions: 0.14.0 > Reporter: Devaraj Das > Assignee: Devaraj Das > Priority: Critical > Fix For: 0.14.0 > > Attachments: 1513.patch > > > Got this exception in a job run. It looks like the problem is a race condition between the creation of a directory and checking for its existence. Specifically, the line: > if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by multiple processes at the same time. > 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1 > 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext: org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp > at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26) > at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211) > at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248) > at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276) > at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155) > at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171) > at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136) > at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342) > at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145) > at org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368) > at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254) > at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675) > at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165) > at org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740) > 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error running child -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.