hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johannes Zillmann ...@101tec.com>
Subject Re: too many open files error
Date Thu, 02 Oct 2008 12:22:10 GMT
Having a similar problem.
After upgrading from hadoop 0.16.4 to 0.17.2.1 we're facing  
"java.io.IOException: java.io.IOException: Too many open files" fater  
a few jobs.
f.e.:
Error message from task (reduce) tip_200810020918_0014_r_000031 Error  
initializing task_200810020918_0014_r_000031_1:
java.io.IOException: java.io.IOException: Too many open files
         at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
         at java.lang.ProcessImpl.start(ProcessImpl.java:65)
         at java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
         at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
         at org.apache.hadoop.util.Shell.run(Shell.java:134)
         at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
         at org.apache.hadoop.fs.LocalDirAllocator 
$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:296)
         at  
org 
.apache 
.hadoop 
.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
         at  
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:646)
         at  
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1271)
         at  
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:912)
         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java: 
1307)
         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java: 
2266)


Once a job failed with because of these exception, all subsequent jobs  
failing too for the same reason.
After cluster-restart it works fine for a few jobs again....

Johannes

On Sep 27, 2008, at 1:59 AM, Karl Anderson wrote:

>
> On 26-Sep-08, at 3:09 PM, Eric Zhang wrote:
>
>> Hi,
>> I encountered following FileNotFoundException resulting from "too  
>> many open files" error when i tried to run a job.  The job had been  
>> run for several times before without problem.  I am confused by the  
>> exception because my code closes all the files and even it  
>> doesn't,  the job only have only 10-20 small input/output files.    
>> The limit on the open file on my box is 1024.    Besides, the error  
>> seemed to happen even before the task was executed, I am using 0.17  
>> version.   I'd appreciate if somebody can shed some light on this  
>> issue.  BTW, the job ran ok after i restarted hadoop.    Yes, the  
>> hadoop-site.xml did exist in that directory.
>
> I had the same errors, including the bash one.  Running one  
> particular job would cause all subsequent jobs of any kind to fail,  
> even after all running jobs had completed or failed out.  This was  
> confusing because the failing jobs themselves often had no  
> relationship to the cause, they were just in a bad environment.
>
> If you can't successfully run a dummy job (with the identity mapper  
> and reducer, or a streaming job with cat) once you start getting  
> failures, then you are probably in the same situation.
>
> I believe that the problem was caused by increasing the timeout, but  
> I never pinned it down enough to submit a Jira issue.  It might have  
> been the XML reader or something else.  I was using streaming,  
> hadoop-ec2, and either 0.17.0 or 0.18.0.  It would happen just as  
> rapidly after I made an ec2 image with a higher open file limit.
>
> Eventually I figured it out by running each job in my pipeline 5 or  
> so times before trying the next one, which let me see which job was  
> causing the problem (because it would eventually fail itself, rather  
> than hosing a later job).
>
> Karl Anderson
> kra@monkey.org
> http://monkey.org/~kra
>
>
>
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec GmbH
Halle (Saale), Saxony-Anhalt, Germany
http://www.101tec.com


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message