hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: IPC Client error | Too many files open
Date Sun, 05 Oct 2008 06:21:54 GMT

I filed https://issues.apache.org/jira/browse/HADOOP-4346 and it might 
explain whats happening here.

Raghu.

Raghu Angadi wrote:
> 
> We are seeing similar issue at Yahoo! as well. 'jmap -histo' and 'jmap 
> -histo:live' are turning out to be pretty helpful. stay tuned.
> 
> How many threads do you expect to be doing HDFS i/o in your case. both 
> the max and normal cases are helpful.
> 
> Thanks,
> Raghu.
> 
> Goel, Ankur wrote:
>> Hi Dhruba,
>>          Thanks for the reply. 1. We are using 0.17.2 version of Hadoop.
>> 2. Max file descriptor settings per process at the time error occurred
>> was 1024. lsof -p <java-proc-id> confirms this as the process ran out of
>> file handles after reaching the limit. Here is the snippet...
>> java    2171 root    0r  FIFO                0,7          23261756 pipe
>> java    2171 root    1w   CHR                1,3              2067
>> /dev/null
>> java    2171 root    2w  FIFO                0,7          23261747 pipe
>> ..
>> ..
>> java    2171 root 1006w  FIFO                0,7          26486656 pipe
>> java    2171 root 1007r  0000                0,8        0 26486657
>> eventpoll
>> java    2171 root 1008r  FIFO                0,7          26492141 pipe
>> java    2171 root 1009w  FIFO                0,7          26492141 pipe
>> java    2171 root 1010r  0000                0,8        0 26492142
>> eventpoll
>> java    2171 root 1011r  FIFO                0,7          26497184 pipe
>> java    2171 root 1012w  FIFO                0,7          26497184 pipe
>> java    2171 root 1013r  0000                0,8        0 26497185
>> eventpoll
>> java    2171 root 1014r  FIFO                0,7          26514795 pipe
>> java    2171 root 1015w  FIFO                0,7          26514795 pipe
>> java    2171 root 1016r  0000                0,8        0 26514796
>> eventpoll
>> java    2171 root 1017r  FIFO                0,7          26510109 pipe
>> java    2171 root 1018w  FIFO                0,7          26510109 pipe
>> java    2171 root 1019r  0000                0,8        0 26510110
>> eventpoll
>> java    2171 root 1020u  IPv6           27549169               TCP
>> server.domain.com:46551->hadoop.aol.com:9000 (ESTABLISHED)
>> java    2171 root 1021r  FIFO                0,7          26527653 pipe
>> java    2171 root 1022w  FIFO                0,7          26527653 pipe
>> java    2171 root 1023u  IPv6           26527645               TCP
>> server.domain.com:15245->hadoop.aol.com:9000 (CLOSE_WAIT)
>>
>> We tried upping the limit and restarting the servers but the problem
>> recurred after 1-2 days.
>>
>> 3. Yes, there are multiple threads in the apache server which are
>> created dynamically.
>> 4. The java log writer plugged into Apache custom log closes and reopens
>> a new log file periodically. The logwriter has 2 threads, one that
>> writes data to FSDataOutputStream and another that wakes up periodically
>> to close the old stream and open a new one.I am trying to see if this is
>> the place where file handles could be leaking.
>> Another thing to note is that we have a signal handler implementation
>> that uses sun.misc package. The signal handler is installed for the java
>> processes to ensure that when Apache gives the java process SIGTERM or
>> SIGINT, we close the file handles properly.
>>
>> I will do some more analysis of our code to find out if it's our code
>> issue or HDFS client issue. In case I find it's a HDFS client issue I'll
>> move this discussion on a Hadoop JIRA.
>>
>> Thanks and Regards
>> -Ankur


Mime
View raw message