hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <ku...@apache.org>
Subject File Paths, Hadoop >= 0.15 and Local Jobs
Date Fri, 12 Oct 2007 22:47:09 GMT
Just in case this can help somebody else and because I just spent a 
couple of hours debugging this, thought I would share and insight.  This 
only affects locally running jobs, not the DFS, and should only affect 
windows users.

On windows with hadoop 0.14 and below, you used to be able to do 
something like this:

<property>
   <name>hadoop.tmp.dir</name>
   <value>/tmp/hadoop</value>
   <description>A base for other temporary directories.</description>
</property>

Essentially ignoring the C:, Well in hadoop version 0.15 and above while 
  hadoop won't complain when the jobs starts, you will start getting 
errors such as this while running jobs such as nutch injector:

java.io.IOException: Target 
file:/C:/nutch/hadoop/mapred/temp/inject-temp-241790994/_reduce_bcubf6/part-00000 
already exists
	at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)

What is happening here is the file system code in hadoop has changed so 
some Path objects are getting resolved to / and some are getting 
resolved to C:/.  (See the RawLocalFileStatus(File f) constructor in 
RawLocalFileSystem if your are interested.  It happens in the 
f.toURI().toString() constructor parameter)

Hadoop sometimes creates relative paths to move files around and so a 
relative path of C:/ from a path of / becomes /C:/... which is an 
absolute path and the job fails because it can't copy to itself.

So, long story short, on windows, when running local jobs with hadoop >= 
0.15, always use the C:/ notation to avoid problems.

Dennis Kubes

Mime
View raw message