cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Holmberg <adam.holmberg.l...@gmail.com>
Subject StoragePort Socket Leak 0.6.5
Date Fri, 15 Oct 2010 19:34:21 GMT
Greetings.

I'm operating a several two-node clusters (version 0.6.5) on VMs in our
development and test environments.

After about a week of operation under similar conditions, one of them
started throwing this:

WARN [main] 2010-10-12 08:08:31,245 CustomTThreadPoolServer.java (line 104)
Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException:
Too many open files
        at
org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124)
        at
org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
        at
org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
        at
org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98)
        at
org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:186)
        at
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:227)
Caused by: java.net.SocketException: Too many open files
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
        at java.net.ServerSocket.implAccept(ServerSocket.java:453)
        at java.net.ServerSocket.accept(ServerSocket.java:421)
        at
org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119)
        ... 5 more

I found that the offending node had hundreds of sockets (on the StoragePort,
between the two nodes) in CLOSE_WAIT state, which was causing new
connections to bump into the fd limit. It seems similar to what is
originally described (but never resolved) several months ago in this thread:


http://www.mail-archive.com/user@cassandra.apache.org/msg01381.html

Has anyone else encountered this problem? I am curious about what might
trigger this in one cluster and not on the others (which operate in the same
environment, and are configured similarly).

Any insight would be appreciated.

Thanks,
Adam

Mime
View raw message