hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 何永强 <heyongqi...@software.ict.ac.cn>
Subject Re: IPC Client error | Too many files open
Date Wed, 08 Oct 2008 03:12:22 GMT
try update jdk to 1.6, there is a bug for jdk 1.5 about nio.
在 2008-9-26,下午7:29,Goel, Ankur 写道:

> Hi Folks,
>
>     We have developed a simple log writer in Java that is plugged into
> Apache custom log and writes log entries directly to our hadoop  
> cluster
> (50 machines, quad core, each with 16 GB Ram and 800 GB hard-disk, 1
> machine as dedicated Namenode another machine as JobTracker &
> TaskTracker + DataNode).
>
> There are around 8 Apache servers dumping logs into HDFS via our  
> writer.
> Everything was working fine and we were getting around 15 - 20 MB log
> data per hour from each server.
>
>
>
> Recently we have been experiencing problems with 2-3 of our Apache
> servers where a file is opened by log-writer in HDFS for writing  
> but it
> never receives any data.
>
> Looking at apache error logs shows the following errors
>
> 08/09/22 05:02:13 INFO ipc.Client: java.io.IOException: Too many open
> files
>         at sun.nio.ch.IOUtil.initPipe(Native Method)
>         at
> sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49)
>         at
> sun.nio.ch.EPollSelectorProvider.openSelector 
> (EPollSelectorProvider.java
> :18)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get 
> (SocketIOWithT
> imeout.java:312)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select 
> (SocketIOWi
> thTimeout.java:227)
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO 
> (SocketIOWithTimeout.java:
> 155)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java: 
> 149)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java: 
> 122)
>         at java.io.FilterInputStream.read(FilterInputStream.java:116)
>         at
> org.apache.hadoop.ipc.Client$Connection$1.read(Client.java:203)
>         at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at
> java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>         at java.io.DataInputStream.readInt(DataInputStream.java:370)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java: 
> 289)
>
>         ...
>
>         ...
>
>  Followed by connection errors saying
>
> "Retrying to connect to server: hadoop-server.com:9000. Already tried
> 'n' times".
>
> (same as above) ...
>
> ....
>
> and is retrying constantly (log-writer set up so that it waits and
> retries).
>
>
>
> Doing an lsof on the log writer java process shows that it got  
> stuck in
> a lot of pipe/event poll and eventually ran out of file handles.
>
> Below is the part of the lsof output
>
>
>
> lsof -p 2171
> COMMAND  PID USER   FD   TYPE             DEVICE     SIZE     NODE  
> NAME
> ....
>
> ....
> java    2171 root   20r  FIFO                0,7          24090207  
> pipe
> java    2171 root   21w  FIFO                0,7          24090207  
> pipe
> java    2171 root   22r  0000                0,8        0 24090208
> eventpoll
> java    2171 root   23r  FIFO                0,7          23323281  
> pipe
> java    2171 root   24r  FIFO                0,7          23331536  
> pipe
> java    2171 root   25w  FIFO                0,7          23306764  
> pipe
> java    2171 root   26r  0000                0,8        0 23306765
> eventpoll
> java    2171 root   27r  FIFO                0,7          23262160  
> pipe
> java    2171 root   28w  FIFO                0,7          23262160  
> pipe
> java    2171 root   29r  0000                0,8        0 23262161
> eventpoll
> java    2171 root   30w  FIFO                0,7          23299329  
> pipe
> java    2171 root   31r  0000                0,8        0 23299330
> eventpoll
> java    2171 root   32w  FIFO                0,7          23331536  
> pipe
> java    2171 root   33r  FIFO                0,7          23268961  
> pipe
> java    2171 root   34w  FIFO                0,7          23268961  
> pipe
> java    2171 root   35r  0000                0,8        0 23268962
> eventpoll
> java    2171 root   36w  FIFO                0,7          23314889  
> pipe
>
> ...
>
> ...
>
> ...
>
> What in DFS client (if any) could have caused this? Could it be
> something else?
>
> Is it not ideal to use an HDFS writer to directly write logs from  
> Apache
> into HDFS?
>
> Is 'Chukwa" (hadoop log collection and analysis framework  
> contributed by
> Yahoo) a better fit for our case?
>
>
>
> I would highly appreciate help on any or all of the above questions.
>
>
>
> Thanks and Regards
>
> -Ankur
>


Mime
View raw message