hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: File does not exist on part-r-00000 file after reducer runs
Date Mon, 11 Feb 2013 18:43:28 GMT
I am not sure everything that may be causing this, especially because the stack trace is cut
off. Your file lease has expired on the output file.  Typically the client is supposed to
keep the file lease up to date, so if RPC had a very long hiccup in it you may be getting
this problem.  It could also be somehow related to the OutputCommitter in another task deleting
the file out from under the task.


From: David Parks <davidparks21@yahoo.com<mailto:davidparks21@yahoo.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Monday, February 11, 2013 12:02 AM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: File does not exist on part-r-00000 file after reducer runs

Are there any rules against writing results to Reducer.Context while in the cleanup() method?

I’ve got a reducer that is downloading a few 10’s of millions of images from a set of
URLs feed to it.

To be efficient I run many connections in parallel, but limit connections per domain and frequency
of connections.

In order to do that efficiently I read in many URLs from the reduce method and queue them
in a processing queue, so at some point we read in all the data and Hadoop calls the cleanup()
 method where I block until all threads have finished processing.

We may continue processing and writing results (in a synchronized manner) for 20 or 30 minutes
after Hadoop reports 100% input records delivered, then at the end, my code appears to exit
normally and I get this exception immediately after:

2013-02-11 05:15:23,606 INFO com.frugg.mapreduce.UrlProcessor (URL Processor Main Loop): Processing
complete, shut down normally                          1
2013-02-11 05:15:23,653 INFO org.apache.hadoop.mapred.TaskLogsTruncater (main): Initializing
logsÊ1Ž4 truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Initialized cache
for UID to User mapping with a cache timeout of 14400 seconds.
2013-02-11 05:15:23,685 INFO org.apache.hadoop.io.nativeio.NativeIO (main): Got UserName hadoop
for UID 106 from the native implementation
2013-02-11 05:15:23,687 ERROR org.apache.hadoop.security.UserGroupInformation (main): PriviledgedActionException
as:hadoop cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop
.hdfs.server.namenode.LeaseExpiredException: No lease on /frugg/image-cache-stage1/_temporary/_attempt_201302110210_0019_r_000002_0/part-r-00002
File does not exist. Holder DFSClient_attempt_201302110210_0019_r_000002_0 does not have any
open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1642)

I have suspicion that there are some subtle rules of Hadoop’s I’m violating here.

View raw message