hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: IPC Client error | Too many files open
Date Fri, 26 Sep 2008 15:40:55 GMT

What does jstack show for this?

Probably better suited for jira discussion.
Raghu.
Goel, Ankur wrote:
> Hi Folks,
> 
>     We have developed a simple log writer in Java that is plugged into
> Apache custom log and writes log entries directly to our hadoop cluster
> (50 machines, quad core, each with 16 GB Ram and 800 GB hard-disk, 1
> machine as dedicated Namenode another machine as JobTracker &
> TaskTracker + DataNode).
> 
> There are around 8 Apache servers dumping logs into HDFS via our writer.
> Everything was working fine and we were getting around 15 - 20 MB log
> data per hour from each server.
> 
>  
> 
> Recently we have been experiencing problems with 2-3 of our Apache
> servers where a file is opened by log-writer in HDFS for writing but it
> never receives any data.
> 
> Looking at apache error logs shows the following errors
> 
> 08/09/22 05:02:13 INFO ipc.Client: java.io.IOException: Too many open
> files
>         at sun.nio.ch.IOUtil.initPipe(Native Method)
>         at
> sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49)
>         at
> sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java
> :18)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithT
> imeout.java:312)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWi
> thTimeout.java:227)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:
> 155)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:149)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:122)
>         at java.io.FilterInputStream.read(FilterInputStream.java:116)
>         at
> org.apache.hadoop.ipc.Client$Connection$1.read(Client.java:203)
>         at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at
> java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>         at java.io.DataInputStream.readInt(DataInputStream.java:370)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:289)
> 
>         ...
> 
>         ...
> 
>  Followed by connection errors saying 
> 
> "Retrying to connect to server: hadoop-server.com:9000. Already tried
> 'n' times".
> 
> (same as above) ...
> 
> ....
> 
> and is retrying constantly (log-writer set up so that it waits and
> retries).
> 
>  
> 
> Doing an lsof on the log writer java process shows that it got stuck in
> a lot of pipe/event poll and eventually ran out of file handles. 
> 
> Below is the part of the lsof output
> 
>  
> 
> lsof -p 2171
> COMMAND  PID USER   FD   TYPE             DEVICE     SIZE     NODE NAME
> ....
> 
> ....
> java    2171 root   20r  FIFO                0,7          24090207 pipe
> java    2171 root   21w  FIFO                0,7          24090207 pipe
> java    2171 root   22r  0000                0,8        0 24090208
> eventpoll
> java    2171 root   23r  FIFO                0,7          23323281 pipe
> java    2171 root   24r  FIFO                0,7          23331536 pipe
> java    2171 root   25w  FIFO                0,7          23306764 pipe
> java    2171 root   26r  0000                0,8        0 23306765
> eventpoll
> java    2171 root   27r  FIFO                0,7          23262160 pipe
> java    2171 root   28w  FIFO                0,7          23262160 pipe
> java    2171 root   29r  0000                0,8        0 23262161
> eventpoll
> java    2171 root   30w  FIFO                0,7          23299329 pipe
> java    2171 root   31r  0000                0,8        0 23299330
> eventpoll
> java    2171 root   32w  FIFO                0,7          23331536 pipe
> java    2171 root   33r  FIFO                0,7          23268961 pipe
> java    2171 root   34w  FIFO                0,7          23268961 pipe
> java    2171 root   35r  0000                0,8        0 23268962
> eventpoll
> java    2171 root   36w  FIFO                0,7          23314889 pipe
> 
> ...
> 
> ...
> 
> ...
> 
> What in DFS client (if any) could have caused this? Could it be
> something else?
> 
> Is it not ideal to use an HDFS writer to directly write logs from Apache
> into HDFS?
> 
> Is 'Chukwa" (hadoop log collection and analysis framework contributed by
> Yahoo) a better fit for our case?
> 
>  
> 
> I would highly appreciate help on any or all of the above questions.
> 
>  
> 
> Thanks and Regards
> 
> -Ankur
> 
> 


Mime
View raw message