I'm operating a several two-node clusters (version 0.6.5) on VMs in our development and test environments.

After about a week of operation under similar conditions, one of them started throwing this:

WARN [main] 2010-10-12 08:08:31,245 (line 104) Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: Too many open files
        at org.apache.thrift.transport.TServerSocket.acceptImpl(
        at org.apache.thrift.transport.TServerSocket.acceptImpl(
        at org.apache.thrift.transport.TServerTransport.accept(
        at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(
        at org.apache.cassandra.thrift.CassandraDaemon.start(
        at org.apache.cassandra.thrift.CassandraDaemon.main(
Caused by: Too many open files
        at Method)
        at org.apache.thrift.transport.TServerSocket.acceptImpl(
        ... 5 more

I found that the offending node had hundreds of sockets (on the StoragePort, between the two nodes) in CLOSE_WAIT state, which was causing new connections to bump into the fd limit. It seems similar to what is originally described (but never resolved) several months ago in this thread:

Has anyone else encountered this problem? I am curious about what might trigger this in one cluster and not on the others (which operate in the same environment, and are configured similarly).

Any insight would be appreciated.