hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Goel <ankur.g...@corp.aol.com>
Subject Re: IPC Client error | Too many files open
Date Fri, 10 Oct 2008 07:09:07 GMT
We have only one thread per process doing HDFS I/O. And there are around 
30 such processes each running on different machine. So there's a 1:1:1 
mapping for  machine:Apache:logwriter.
I think HADOOP-4346 explains a lot.

Thanks Raghu :-)

Raghu Angadi wrote:
>
> I filed https://issues.apache.org/jira/browse/HADOOP-4346 and it might 
> explain whats happening here.
>
> Raghu.
>
> Raghu Angadi wrote:
>>
>> We are seeing similar issue at Yahoo! as well. 'jmap -histo' and 
>> 'jmap -histo:live' are turning out to be pretty helpful. stay tuned.
>>
>> How many threads do you expect to be doing HDFS i/o in your case. 
>> both the max and normal cases are helpful.
>>
>> Thanks,
>> Raghu.
>>
>> Goel, Ankur wrote:
>>> Hi Dhruba,
>>>          Thanks for the reply. 1. We are using 0.17.2 version of 
>>> Hadoop.
>>> 2. Max file descriptor settings per process at the time error occurred
>>> was 1024. lsof -p <java-proc-id> confirms this as the process ran 
>>> out of
>>> file handles after reaching the limit. Here is the snippet...
>>> java    2171 root    0r  FIFO                0,7          23261756 pipe
>>> java    2171 root    1w   CHR                1,3              2067
>>> /dev/null
>>> java    2171 root    2w  FIFO                0,7          23261747 pipe
>>> ..
>>> ..
>>> java    2171 root 1006w  FIFO                0,7          26486656 pipe
>>> java    2171 root 1007r  0000                0,8        0 26486657
>>> eventpoll
>>> java    2171 root 1008r  FIFO                0,7          26492141 pipe
>>> java    2171 root 1009w  FIFO                0,7          26492141 pipe
>>> java    2171 root 1010r  0000                0,8        0 26492142
>>> eventpoll
>>> java    2171 root 1011r  FIFO                0,7          26497184 pipe
>>> java    2171 root 1012w  FIFO                0,7          26497184 pipe
>>> java    2171 root 1013r  0000                0,8        0 26497185
>>> eventpoll
>>> java    2171 root 1014r  FIFO                0,7          26514795 pipe
>>> java    2171 root 1015w  FIFO                0,7          26514795 pipe
>>> java    2171 root 1016r  0000                0,8        0 26514796
>>> eventpoll
>>> java    2171 root 1017r  FIFO                0,7          26510109 pipe
>>> java    2171 root 1018w  FIFO                0,7          26510109 pipe
>>> java    2171 root 1019r  0000                0,8        0 26510110
>>> eventpoll
>>> java    2171 root 1020u  IPv6           27549169               TCP
>>> server.domain.com:46551->hadoop.aol.com:9000 (ESTABLISHED)
>>> java    2171 root 1021r  FIFO                0,7          26527653 pipe
>>> java    2171 root 1022w  FIFO                0,7          26527653 pipe
>>> java    2171 root 1023u  IPv6           26527645               TCP
>>> server.domain.com:15245->hadoop.aol.com:9000 (CLOSE_WAIT)
>>>
>>> We tried upping the limit and restarting the servers but the problem
>>> recurred after 1-2 days.
>>>
>>> 3. Yes, there are multiple threads in the apache server which are
>>> created dynamically.
>>> 4. The java log writer plugged into Apache custom log closes and 
>>> reopens
>>> a new log file periodically. The logwriter has 2 threads, one that
>>> writes data to FSDataOutputStream and another that wakes up 
>>> periodically
>>> to close the old stream and open a new one.I am trying to see if 
>>> this is
>>> the place where file handles could be leaking.
>>> Another thing to note is that we have a signal handler implementation
>>> that uses sun.misc package. The signal handler is installed for the 
>>> java
>>> processes to ensure that when Apache gives the java process SIGTERM or
>>> SIGINT, we close the file handles properly.
>>>
>>> I will do some more analysis of our code to find out if it's our code
>>> issue or HDFS client issue. In case I find it's a HDFS client issue 
>>> I'll
>>> move this discussion on a Hadoop JIRA.
>>>
>>> Thanks and Regards
>>> -Ankur
>


Mime
View raw message