hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: IPC Client error | Too many files open
Date Sat, 04 Oct 2008 05:33:52 GMT

We are seeing similar issue at Yahoo! as well. 'jmap -histo' and 'jmap 
-histo:live' are turning out to be pretty helpful. stay tuned.

How many threads do you expect to be doing HDFS i/o in your case. both 
the max and normal cases are helpful.

Thanks,
Raghu.

Goel, Ankur wrote:
> Hi Dhruba,
>          Thanks for the reply. 
> 1. We are using 0.17.2 version of Hadoop.
> 2. Max file descriptor settings per process at the time error occurred
> was 1024. lsof -p <java-proc-id> confirms this as the process ran out of
> file handles after reaching the limit. Here is the snippet...
> java    2171 root    0r  FIFO                0,7          23261756 pipe
> java    2171 root    1w   CHR                1,3              2067
> /dev/null
> java    2171 root    2w  FIFO                0,7          23261747 pipe
> ..
> ..
> java    2171 root 1006w  FIFO                0,7          26486656 pipe
> java    2171 root 1007r  0000                0,8        0 26486657
> eventpoll
> java    2171 root 1008r  FIFO                0,7          26492141 pipe
> java    2171 root 1009w  FIFO                0,7          26492141 pipe
> java    2171 root 1010r  0000                0,8        0 26492142
> eventpoll
> java    2171 root 1011r  FIFO                0,7          26497184 pipe
> java    2171 root 1012w  FIFO                0,7          26497184 pipe
> java    2171 root 1013r  0000                0,8        0 26497185
> eventpoll
> java    2171 root 1014r  FIFO                0,7          26514795 pipe
> java    2171 root 1015w  FIFO                0,7          26514795 pipe
> java    2171 root 1016r  0000                0,8        0 26514796
> eventpoll
> java    2171 root 1017r  FIFO                0,7          26510109 pipe
> java    2171 root 1018w  FIFO                0,7          26510109 pipe
> java    2171 root 1019r  0000                0,8        0 26510110
> eventpoll
> java    2171 root 1020u  IPv6           27549169               TCP
> server.domain.com:46551->hadoop.aol.com:9000 (ESTABLISHED)
> java    2171 root 1021r  FIFO                0,7          26527653 pipe
> java    2171 root 1022w  FIFO                0,7          26527653 pipe
> java    2171 root 1023u  IPv6           26527645               TCP
> server.domain.com:15245->hadoop.aol.com:9000 (CLOSE_WAIT)
> 
> We tried upping the limit and restarting the servers but the problem
> recurred after 1-2 days.
> 
> 3. Yes, there are multiple threads in the apache server which are
> created dynamically.
> 4. The java log writer plugged into Apache custom log closes and reopens
> a new log file periodically. The logwriter has 2 threads, one that
> writes data to FSDataOutputStream and another that wakes up periodically
> to close the old stream and open a new one.I am trying to see if this is
> the place where file handles could be leaking. 
> 
> Another thing to note is that we have a signal handler implementation
> that uses sun.misc package. The signal handler is installed for the java
> processes to ensure that when Apache gives the java process SIGTERM or
> SIGINT, we close the file handles properly.
> 
> I will do some more analysis of our code to find out if it's our code
> issue or HDFS client issue. In case I find it's a HDFS client issue I'll
> move this discussion on a Hadoop JIRA.
> 
> Thanks and Regards
> -Ankur

Mime
View raw message