Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 11604 invoked from network); 12 Oct 2007 22:48:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 Oct 2007 22:48:18 -0000 Received: (qmail 91614 invoked by uid 500); 12 Oct 2007 22:48:00 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 91581 invoked by uid 500); 12 Oct 2007 22:48:00 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 91563 invoked by uid 99); 12 Oct 2007 22:48:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Oct 2007 15:48:00 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [63.133.162.107] (HELO mp-s01.dragonflymc.com) (63.133.162.107) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Oct 2007 22:48:04 +0000 Received: from [192.168.1.249] (bigfoo.visvo.com [192.168.1.249]) by mp-s01.dragonflymc.com (Postfix) with ESMTP id 1F2A1F8063; Fri, 12 Oct 2007 16:53:10 -0500 (CDT) Message-ID: <470FF96D.1000303@apache.org> Date: Fri, 12 Oct 2007 17:47:09 -0500 From: Dennis Kubes User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: nutch-user@lucene.apache.org, hadoop-user@lucene.apache.org Subject: File Paths, Hadoop >= 0.15 and Local Jobs Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Just in case this can help somebody else and because I just spent a couple of hours debugging this, thought I would share and insight. This only affects locally running jobs, not the DFS, and should only affect windows users. On windows with hadoop 0.14 and below, you used to be able to do something like this: hadoop.tmp.dir /tmp/hadoop A base for other temporary directories. Essentially ignoring the C:, Well in hadoop version 0.15 and above while hadoop won't complain when the jobs starts, you will start getting errors such as this while running jobs such as nutch injector: java.io.IOException: Target file:/C:/nutch/hadoop/mapred/temp/inject-temp-241790994/_reduce_bcubf6/part-00000 already exists at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116) What is happening here is the file system code in hadoop has changed so some Path objects are getting resolved to / and some are getting resolved to C:/. (See the RawLocalFileStatus(File f) constructor in RawLocalFileSystem if your are interested. It happens in the f.toURI().toString() constructor parameter) Hadoop sometimes creates relative paths to move files around and so a relative path of C:/ from a path of / becomes /C:/... which is an absolute path and the job fails because it can't copy to itself. So, long story short, on windows, when running local jobs with hadoop >= 0.15, always use the C:/ notation to avoid problems. Dennis Kubes