hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <ja...@attributor.com>
Subject Re: too many open files? Isn't 4K enough???
Date Wed, 05 Nov 2008 23:33:36 GMT
we just went from 8k to 64k after some problems,

Karl Anderson wrote:
>
> On 4-Nov-08, at 3:45 PM, Yuri Pradkin wrote:
>
>> Hi,
>>
>> I'm running current snapshot (-r709609), doing a simple word count 
>> using python over
>> streaming.  I'm have a relatively moderate setup of 17 nodes.
>>
>> I'm getting this exception:
>>
>> java.io.FileNotFoundException: 
>> /usr/local/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_0003/attempt_200811041109_0003_m_000000_0/output/spill4055.out.index

>>
>> (Too many open files)
>
>> [...]
>
>> I see that AFTER I've reconfigured the max allowable open files to 4096!
>
> I am running into a similar issue.  It seems to be affected by the
> number of simultaneous tasks.
>
> Here's one example.  I have a mapper-only streaming job with 512 tasks
> and a total combined output of 93 megs for a successful run.  The tasks
> don't interact with HDFS other than what streaming sets up with stdin
> and stdout.
>
> On a 32 slave cluster with 8 max mappers per node, I see 27 failed
> task attempts and 5 black-listed task trackers by the time the job
> fails, although this will change from run to run.  The failures are
> all file-handle related: "can't run mapper, no such file or directory
> <mapper source>" , "IOException: can't run bash, too many open files",
> and many unknown failures because the admin interface couldn't find
> the log files.  This happens with both hadoop 17.2.1 and 18.1.
>
> On a 128 slave cluster with 2 max mappers per cluster, same job, same
> number of tasks, I get no failures.  This is running hadoop 18.1.
>
> I'm running on EC2 clusters created with the hadoop-ec2 tools.  The
> 2-mapper nodes are small EC2 instances, and the 8-mapper nodes are
> xlarge EC2 instances.  I upped the nofile limit in
> /etc/security/limits.conf to 131072 for all users on all of my EC2
> images, but it didn't help.  I'm never running more than one job at
> once.
>
> The hadoop-ec2 tools launch clusters with one master which runs the
> namenode and jobtracker, and slaves each running a datanode and
> tasktracker.  It seems that running more than 2 mappers per node isn't
> feasable with this setup, which surprises me because the cluster setup
> suggestions I've read advise using far more.  Would changing the ratio
> of datanodes to tasktrackers have an effect?  Is this done in
> practice?
>
> Are you running more than 2 mappers per node?  Do you see any
> differences in the number of failed tasks when you change the number
> of tasks over the same input set?
>
> In general, I've had to do a lot of fine-tuning of my job paramaters
> to balance memory, file handles, and task timeouts.  I'm finding that
> a setup that works with one input set breaks when I try it on an input
> set which is twice the size.  My productivity is not high while I'm
> figuring this out, and I wonder why I don't hear about this more.
> Perhaps this is a streaming issue, and streaming isn't being used very
> much?
>
>
> Karl Anderson
> kra@monkey.org
> http://monkey.org/~kra
>
>
>

Mime
View raw message