hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuri Pradkin (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-4614) "Too many open files" error while processing a large gzip file
Date Sat, 08 Nov 2008 00:33:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yuri Pradkin updated HADOOP-4614:
---------------------------------

    Attachment: openfds.txt

I'm posting the results of lsof while running Abdul's code. 

After I upped the max number of fds to 16K, the job ran to completion.
I was monitoring the number of open files/processes every 15s (by simply 
running ps and lsof | wc -l) and saw this:
#processes   open_files
...
13   646
13   648
12   2535
13   4860
12   4346
12   3842
12   3324
12   2823
12   2316
12   1852
12   1387
12   936
12   643
12   643
12   643
12   643
12   643
12   643
13   642
12   642
12   4775
12   2738
12   917
12   643
12   642
12   4992
12   4453
12   3943
12   3299
12   2855
12   2437
...

It looks like something (garbage collection?) cleans up fds periodically; the 
max I saw was 5007 (but again, there may have been more in between the 15s 
sampling interval).

> "Too many open files" error while processing a large gzip file
> --------------------------------------------------------------
>
>                 Key: HADOOP-4614
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4614
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.18.2
>            Reporter: Abdul Qadeer
>             Fix For: 0.18.3
>
>         Attachments: openfds.txt
>
>
> I am running a simple word count program on a gzip compressed data of size 4 GB (Uncompressed
size is about 7 GB).  I have setup of 17 nodes in my Hadoop cluster.  After some time, I get
the following exception:
> java.io.FileNotFoundException: /usr/local/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200811041109_0003/attempt_200811041109_0003_m_000000_0/output/spill4055.out.index
> (Too many open files)
>        at java.io.FileInputStream.open(Native Method)
>        at java.io.FileInputStream.(FileInputStream.java:137)
>        at org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:62)
>        at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:98)
>        at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:168)
>        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
>        at org.apache.hadoop.mapred.IndexRecord.readIndexFile(IndexRecord.java:47)
>        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.getIndexInformation(MapTask.java:1339)
>        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1237)
>        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:857)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
>        at org.apache.hadoop.mapred.Child.main(Child.java:155)
> From a user's perspective I know that Hadoop will use only one mapper for a gzipped file.
 The above exception suggests that probably Hadoop puts the intermediate data into many files.
 But the question is that "exactly how many open files are required in the worst case for
any data size and cluster size?"  Currently it looks as if Hadoop needs more number of open
files as the size of input or the cluster size (in terms of nodes, mapper, reducers) increases.
 This is not plausible as far as scalability is concerned.  A user needs to write some number
in the /etc/security/limits.conf file that how many open files are allowed by hadoop node.
 The question is what that "magical number" should be?
> So probably the best solution to this problem is to change Hadoop such a way that it
can work with some moderate number of allowed open files (e.g. 4 K) or any other number should
be suggested as an upper limit such that a user is sure that for any data size and cluster
size, hadoop will not run into this "too many open files" issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message