cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Leimbach <leim...@gmail.com>
Subject Re: Cassandra stucks
Date Fri, 11 May 2012 14:47:59 GMT
What's the version number of Cassandra?

On Fri, May 11, 2012 at 7:38 AM, Pavel Polushkin <ppolushkin@enkata.com>wrote:

> Hello,****
>
> ** **
>
> We faced with a strange problem while testing performance on Cassandra
> cluster. After some time all nodes went to down state for several days. Now
> all nodes went back to up state and only one node still down.****
>
> ** **
>
> Nodetool on down node throws exception:****
>
> Error connection to remote JMX agent!****
>
> java.io.IOException: Failed to retrieve RMIServer stub:
> javax.naming.CommunicationException [Root exception is
> java.rmi.ConnectIOException: error during JRMP connection establishment;
> nested exception is:****
>
>         java.net.SocketTimeoutException: Read timed out]****
>
>         at
> javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340)***
> *
>
>         at
> javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)
> ****
>
>         at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:144)
> ****
>
>         at org.apache.cassandra.tools.NodeProbe.<init>(NodeProbe.java:114)
> ****
>
>         at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:623)****
>
> Caused by: javax.naming.CommunicationException [Root exception is
> java.rmi.ConnectIOException: error during JRMP connection establishment;
> nested exception is:****
>
>         java.net.SocketTimeoutException: Read timed out]****
>
>         at
> com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101)
> ****
>
>         at
> com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185)
> ****
>
>         at javax.naming.InitialContext.lookup(InitialContext.java:392)****
>
>         at
> javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888)
> ****
>
>         at
> javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858)
> ****
>
>         at
> javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257)***
> *
>
>         ... 4 more****
>
> Caused by: java.rmi.ConnectIOException: error during JRMP connection
> establishment; nested exception is:****
>
>         java.net.SocketTimeoutException: Read timed out****
>
>         at
> sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286)****
>
>         at
> sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)****
>
>         at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322)****
>
>         at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)****
>
>         at
> com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97)*
> ***
>
>         ... 9 more****
>
> Caused by: java.net.SocketTimeoutException: Read timed out****
>
>         at java.net.SocketInputStream.socketRead0(Native Method)****
>
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)****
>
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)*
> ***
>
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:237)*
> ***
>
>         at java.io.DataInputStream.readByte(DataInputStream.java:248)****
>
>         at
> sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)****
>
>         ... 13 more****
>
> ** **
>
> In system log of down node unlimited list of such errors:****
>
> INFO [GossipStage:1] 2012-05-10 23:18:27,579 Gossiper.java (line 804)
> InetAddress /172.15.2.161 is now UP INFO [GossipStage:1] 2012-05-10
> 23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.162 is now UP
> INFO [GossipStage:1] 2012-05-10 23:18:27,580 Gossiper.java (line 804)
> InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10
> 23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.165 is now UP
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
> InetAddress /172.15.2.161 is now dead.****
>
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
> InetAddress /172.15.2.165 is now dead.****
>
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
> InetAddress /172.15.2.162 is now dead.****
>
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
> InetAddress /172.15.2.163 is now dead.****
>
> INFO [GossipStage:1] 2012-05-10 23:18:29,291 Gossiper.java (line 804)
> InetAddress /172.15.2.161 is now UP INFO [GossipStage:1] 2012-05-10
> 23:18:29,292 Gossiper.java (line 804) InetAddress /172.15.2.162 is now UP
> INFO [GossipStage:1] 2012-05-10 23:18:29,292 Gossiper.java (line 804)
> InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10
> 23:18:29,292 Gossiper.java (line 804) InetAddress /172.15.2.165 is now UP*
> ***
>
> ** **
>
> The suspicious fact is that on this node we have several tcp connections
> to other nodes 7000 port in CLOSE_WAIT state:****
>
> Active Internet connections (servers and established)****
>
> Proto Recv-Q Send-Q Local Address           Foreign Address         State*
> ***
>
> tcp   869073      0 rcwocas:afs3-fileserver rcwocas03.enkata.:34274
> CLOSE_WAIT****
>
> tcp   463429      0 rcwocas:afs3-fileserver rcwocas02.enkata.:39654
> CLOSE_WAIT****
>
> tcp   873838      0 rcwocas:afs3-fileserver rcwocas01.enkata.:49486
> CLOSE_WAIT****
>
> tcp   860245      0 rcwocas:afs3-fileserver rcwocas05.enkata.:43028
> CLOSE_WAIT****
>
> tcp      112      0 rcwocas:afs3-fileserver rcwocas02.enkata.:40321
> CLOSE_WAIT****
>
> tcp     2124      0 rcwocas:afs3-fileserver rcwocas03.enkata.:39338
> CLOSE_WAIT****
>
> tcp        0      0 rcwocas:afs3-fileserver rcwocas01.enkata.:56408
> ESTABLISHED****
>
> tcp      184      0 rcwocas:afs3-fileserver rcwocas01.enkata.:48862
> CLOSE_WAIT****
>
> tcp   534489      0 rcwocas:afs3-fileserver rcwocas02.enkata.:35331
> ESTABLISHED****
>
> tcp      886      0 rcwocas:afs3-fileserver rcwocas03.enkata.:56034
> CLOSE_WAIT****
>
> tcp        0      0 rcwocas04.Enkata.:48800 rcwocas:afs3-fileserver
> ESTABLISHED****
>
> tcp        0      0 rcwocas:afs3-fileserver rcwocas01.enkata.:51348
> ESTABLISHED****
>
> tcp      187      0 rcwocas:afs3-fileserver rcwocas05.enkata.:45538
> CLOSE_WAIT****
>
> tcp      253      0 rcwocas:afs3-fileserver rcwocas03.enkata.:51359
> CLOSE_WAIT****
>
> ** **
>
> Also I have attached thread dump****
>
> ** **
>
> Thanks,****
>
> Pavel****
>
> ** **
>
> ** **
>

Mime
View raw message