cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Madalina Matei <madalinaima...@gmail.com>
Subject Re: Cassandra stucks
Date Fri, 11 May 2012 15:39:50 GMT
Are you using EC2 ?

On 11 May 2012, at 16:13, Pavel Polushkin wrote:

> We use 1.0.8 version.
>  
> From: David Leimbach [mailto:leimy2k@gmail.com] 
> Sent: Friday, May 11, 2012 18:48
> To: user@cassandra.apache.org
> Subject: Re: Cassandra stucks
>  
> What's the version number of Cassandra?
> 
> On Fri, May 11, 2012 at 7:38 AM, Pavel Polushkin <ppolushkin@enkata.com> wrote:
> Hello,
> 
>  
> 
> We faced with a strange problem while testing performance on Cassandra cluster. After
some time all nodes went to down state for several days. Now all nodes went back to up state
and only one node still down.
> 
>  
> 
> Nodetool on down node throws exception:
> 
> Error connection to remote JMX agent!
> 
> java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException
[Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment;
nested exception is:
> 
>         java.net.SocketTimeoutException: Read timed out]
> 
>         at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340)
> 
>         at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)
> 
>         at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:144)
> 
>         at org.apache.cassandra.tools.NodeProbe.<init>(NodeProbe.java:114)
> 
>         at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:623)
> 
> Caused by: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException:
error during JRMP connection establishment; nested exception is:
> 
>         java.net.SocketTimeoutException: Read timed out]
> 
>         at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101)
> 
>         at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185)
> 
>         at javax.naming.InitialContext.lookup(InitialContext.java:392)
> 
>         at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888)
> 
>         at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858)
> 
>         at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257)
> 
>         ... 4 more
> 
> Caused by: java.rmi.ConnectIOException: error during JRMP connection establishment; nested
exception is:
> 
>         java.net.SocketTimeoutException: Read timed out
> 
>         at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286)
> 
>         at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
> 
>         at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322)
> 
>         at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
> 
>         at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97)
> 
>         ... 9 more
> 
> Caused by: java.net.SocketTimeoutException: Read timed out
> 
>         at java.net.SocketInputStream.socketRead0(Native Method)
> 
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
> 
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> 
>         at java.io.DataInputStream.readByte(DataInputStream.java:248)
> 
>         at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)
> 
>         ... 13 more
> 
>  
> 
> In system log of down node unlimited list of such errors:
> 
> INFO [GossipStage:1] 2012-05-10 23:18:27,579 Gossiper.java (line 804) InetAddress /172.15.2.161
is now UP INFO [GossipStage:1] 2012-05-10 23:18:27,580 Gossiper.java (line 804) InetAddress
/172.15.2.162 is now UP INFO [GossipStage:1] 2012-05-10 23:18:27,580 Gossiper.java (line 804)
InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10 23:18:27,580 Gossiper.java
(line 804) InetAddress /172.15.2.165 is now UP INFO [GossipTasks:1] 2012-05-10 23:18:29,291
Gossiper.java (line 818) InetAddress /172.15.2.161 is now dead.
> 
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818) InetAddress /172.15.2.165
is now dead.
> 
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818) InetAddress /172.15.2.162
is now dead.
> 
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818) InetAddress /172.15.2.163
is now dead.
> 
> INFO [GossipStage:1] 2012-05-10 23:18:29,291 Gossiper.java (line 804) InetAddress /172.15.2.161
is now UP INFO [GossipStage:1] 2012-05-10 23:18:29,292 Gossiper.java (line 804) InetAddress
/172.15.2.162 is now UP INFO [GossipStage:1] 2012-05-10 23:18:29,292 Gossiper.java (line 804)
InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10 23:18:29,292 Gossiper.java
(line 804) InetAddress /172.15.2.165 is now UP
> 
>  
> 
> The suspicious fact is that on this node we have several tcp connections to other nodes
7000 port in CLOSE_WAIT state:
> 
> Active Internet connections (servers and established)
> 
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
> 
> tcp   869073      0 rcwocas:afs3-fileserver rcwocas03.enkata.:34274 CLOSE_WAIT
> 
> tcp   463429      0 rcwocas:afs3-fileserver rcwocas02.enkata.:39654 CLOSE_WAIT
> 
> tcp   873838      0 rcwocas:afs3-fileserver rcwocas01.enkata.:49486 CLOSE_WAIT
> 
> tcp   860245      0 rcwocas:afs3-fileserver rcwocas05.enkata.:43028 CLOSE_WAIT
> 
> tcp      112      0 rcwocas:afs3-fileserver rcwocas02.enkata.:40321 CLOSE_WAIT
> 
> tcp     2124      0 rcwocas:afs3-fileserver rcwocas03.enkata.:39338 CLOSE_WAIT
> 
> tcp        0      0 rcwocas:afs3-fileserver rcwocas01.enkata.:56408 ESTABLISHED
> 
> tcp      184      0 rcwocas:afs3-fileserver rcwocas01.enkata.:48862 CLOSE_WAIT
> 
> tcp   534489      0 rcwocas:afs3-fileserver rcwocas02.enkata.:35331 ESTABLISHED
> 
> tcp      886      0 rcwocas:afs3-fileserver rcwocas03.enkata.:56034 CLOSE_WAIT
> 
> tcp        0      0 rcwocas04.Enkata.:48800 rcwocas:afs3-fileserver ESTABLISHED
> 
> tcp        0      0 rcwocas:afs3-fileserver rcwocas01.enkata.:51348 ESTABLISHED
> 
> tcp      187      0 rcwocas:afs3-fileserver rcwocas05.enkata.:45538 CLOSE_WAIT
> 
> tcp      253      0 rcwocas:afs3-fileserver rcwocas03.enkata.:51359 CLOSE_WAIT
> 
>  
> 
> Also I have attached thread dump
> 
>  
> 
> Thanks,
> 
> Pavel
> 
>  
>  


Mime
View raw message