flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alo alt <wget.n...@googlemail.com>
Subject Re: Collector node failing with java.net.SocketException: Too many open files
Date Fri, 27 Jan 2012 09:40:09 GMT
Hi,

# cat /etc/security/limits.conf
 flume            soft     nofile         5000
 flume            hard     nofile         5000
 

# cat /etc/sysctl.conf
 fs.file-max=200000

can you try that settings?

Max open files 1024 is a default value and designed for small servers / PC. 

- Alex 

--
Alexander Lorenz
http://mapredit.blogspot.com

On Jan 26, 2012, at 6:04 PM, Frank Grimes wrote:

> It's 1024, but we really shouldn't  need to up that value... doing so would just delay
the failure.
> 
> 
> On 2012-01-26, at 11:57 AM, Zijad Purkovic wrote:
> 
>> Hi Frank,
>> 
>> Can you show output of ulimit -n from your collector node?
>> 
>> On Thu, Jan 26, 2012 at 4:51 PM, Frank Grimes <frankgrimes97@yahoo.com> wrote:
>>> Hi All,
>>> 
>>> We are using flume-0.9.5
>>> (specifically, http://svn.apache.org/repos/asf/incubator/flume/trunk@1179275)
>>> and occasionally our Collector node accumulates too many open TCP
>>> connections and starts madly logging the following errors:
>>> 
>>> WARN org.apache.thrift.server.TSaneThreadPoolServer: Transport error
>>> occurred during acceptance of message.
>>> org.apache.thrift.transport.TTransportException: java.net.SocketException:
>>> Too many open files
>>>       at
>>> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:139)
>>>       at
>>> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
>>>       at
>>> org.apache.thrift.server.TSaneThreadPoolServer$1.run(TSaneThreadPoolServer.java:175)
>>> Caused by: java.net.SocketException: Too many open files
>>>       at java.net.PlainSocketImpl.socketAccept(Native Method)
>>>       at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
>>>       at java.net.ServerSocket.implAccept(ServerSocket.java:462)
>>>       at java.net.ServerSocket.accept(ServerSocket.java:430)
>>>       at
>>> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:134)
>>>       ... 2 more
>>> 
>>> 
>>> This quickly fills up the disk as the log file grows to multiple gigabytes
>>> in size.
>>> 
>>> After some investigation, it appears that even though the Agent nodes show
>>> single open connections to the Collector, the Collector node appears to have
>>> a bunch of zombie TCP connections open back to the Agent nodes.
>>> i.e.
>>> "lsof -n | grep PORT" on the Agent node shows 1 established connection
>>> However, the Collector node shows hundreds of established connections for
>>> that same port which don't seem to tie up to any connections I can find on
>>> the Agent node.
>>> 
>>> So we're concluding that the Collector node is somehow leaking connections.
>>> 
>>> Has anyone seen this kind of thing before?
>>> 
>>> Could this be related to https://issues.apache.org/jira/browse/FLUME-857?
>>> Or could this be a Thrift bug that could be avoided by switching to Avro
>>> sources/sinks?
>>> 
>>> Any hints/tips are most welcome.
>>> 
>>> Thanks,
>>> 
>>> Frank Grimes
>> 
>> 
>> 
>> -- 
>> Zijad Purković
>> Dobrovoljnih davalaca krvi 3/19, Zavidovići
>> 061/ 690 - 241
> 


Mime
View raw message