cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Cassandra stucks
Date Mon, 14 May 2012 21:14:54 GMT
If you are on DSE I would try the DS forums first http://www.datastax.com/support-forums/

Cheers
 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/05/2012, at 1:11 AM, Pavel Polushkin wrote:

> Thank you for your reply Aaron!
>  
> Here is more details about our environment:
> 1.      OS – Debian 6.0.4
> root@rcwocas05:~# uname -a
> Linux rcwocas05 2.6.32-5-amd64 #1 SMP Thu Mar 22 17:26:33 UTC 2012 x86_64 GNU/Linux
> 2.      Java
> root@rcwocas05:~# java -version
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
> 3.      Cassandra
> Datastax Enterprise – dse 2.0.1, built on Cassandra 1.0.8
>  
> Now I will try to answer your questions:
> Q: Where are you running nodetool from ?
> A: From the same node
>  
> Q: Do you have any switches or firewalls between the servers ? Could they be closing
connections?
> A: All nodes are on the same switch without firewalls. We are using nodetool –h localhost
ring, also cluster configured by IP addresses, so it’s seems that DNS should not be the
cause of this.
>  
> We didn’t find network problems on cluster. Here is ifconfig that shows that there
is no error or dropped packets:
> root@rcwocas05:~# ifconfig
> eth1      Link encap:Ethernet  HWaddr 00:25:90:63:16:c8
>           inet addr:172.15.2.165  Bcast:172.15.2.255  Mask:255.255.255.0
>           inet6 addr: fe80::225:90ff:fe63:16c8/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
>           RX packets:573854310 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:777260637 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:5649087010357 (5.1 TiB)  TX bytes:4024634337167 (3.6 TiB)
>  
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:926558 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:926558 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:220573098 (210.3 MiB)  TX bytes:220573098 (210.3 MiB)
>  
> Seems that network working fine, we tested ping then several nodes were in down state
and didn’t find any connection errors.
>  
> Q: Anything interesting with your networking setup
> A: We using standard Cassandra configuration. The only think we configured is IP’s
of cluster nodes.
>  
>  
> I would like to give more details:
>  
> 1.      Attaching Cassandra node thread dump when it in down state. You can find that
all “network” threads are in WAITING state. Here is example of such threads:
> "Thrift:730" daemon prio=10 tid=0x00007f15bc050800 nid=0x30c5 waiting on condition [0x00007f934e75b000]
>    java.lang.Thread.State: WAITING (parking)
>                 at sun.misc.Unsafe.park(Native Method)
>                 - parking to wait for  <0x00000006810ec428> (a java.util.concurrent.SynchronousQueue$TransferStack)
>                 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>                 at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:422)
>                 at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:323)
>                 at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:857)
>                 at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
>                 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>                 at java.lang.Thread.run(Thread.java:662)
> "WRITE-/172.15.2.162" daemon prio=10 tid=0x0000000000fc8800 nid=0x1bb4 waiting on condition
[0x00007f934d082000]
>    java.lang.Thread.State: WAITING (parking)
>                 at sun.misc.Unsafe.park(Native Method)
>                 - parking to wait for  <0x0000000681115318> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>                 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>                 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>                 at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>                 at org.apache.cassandra.net.OutboundTcpConnection.take(OutboundTcpConnection.java:198)
>                 at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:90)
> …
>  
> Here is logs from two nodes: Node1 and node 5 when Node5 goes to down state: Node 5 contains
exceptions while Node 1 only diagnose that Node 5 went to down state.
>  
> NODE 5 LOG:
> ---
> DEBUG [Thrift:873] 2012-05-14 05:44:03,066 StorageProxy.java (line212) Write timeout
java.util.concurrent.TimeoutException for one (or more) of:
> DEBUG [Thrift:873] 2012-05-14 05:44:03,066 CassandraServer.java (line648) ... timed out
> DEBUG [Thrift:873] 2012-05-14 05:44:03,066 CustomTThreadPoolServer.java (line 260) Thrift
transport error occurred during processing of message.
> org.apache.thrift.transport.TTransportException: Cannot read. Remote side has closed.
Tried to read 4 bytes, but only got 0 bytes. (This is
> often indicative of an internal error on the server side. Please check your server logs.)
>        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
>        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>        at org.apache.cassandra.thrift.Dse$Processor.process(Dse.java:908)
>        at com.datastax.bdp.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:250)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> DEBUG [Thread-3010] 2012-05-14 05:44:12,007 Gossiper.java (line 222) Reseting version
for /172.15.2.164
>  INFO [GossipTasks:1] 2012-05-14 05:44:12,135 Gossiper.java (line 818) InetAddress /172.15.2.161
is now dead.
> DEBUG [GossipTasks:1] 2012-05-14 05:44:12,135 MessagingService.java (line 221) Resetting
pool for /172.15.2.161
>  INFO [GossipTasks:1] 2012-05-14 05:44:12,135 Gossiper.java (line 818) InetAddress /172.15.2.162
is now dead.
> DEBUG [GossipTasks:1] 2012-05-14 05:44:12,135 MessagingService.java (line 221) Resetting
pool for /172.15.2.162
>  INFO [GossipTasks:1] 2012-05-14 05:44:12,136 Gossiper.java (line 818) InetAddress /172.15.2.163
is now dead.
> DEBUG [GossipTasks:1] 2012-05-14 05:44:12,136 MessagingService.java (line 221) Resetting
pool for /172.15.2.163
> DEBUG [Thread-3008] 2012-05-14 05:44:12,625 Gossiper.java (line 222) Reseting version
for /172.15.2.163
> DEBUG [Thread-3007] 2012-05-14 05:44:12,626 Gossiper.java (line 222) Reseting version
for /172.15.2.161
> DEBUG [Thread-3009] 2012-05-14 05:44:12,682 Gossiper.java (line 222) Reseting version
for /172.15.2.162
> DEBUG [WRITE-/172.15.2.163] 2012-05-14 05:44:13,136 OutboundTcpConnection.java (line
211) attempting to connect to
> /172.15.2.163 DEBUG [WRITE-/172.15.2.161] 2012-05-14 05:44:13,136 OutboundTcpConnection.java
(line 211) attempting to connect to
> /172.15.2.161  INFO [GossipTasks:1] 2012-05-14 05:44:14,136 Gossiper.java (line 818)
InetAddress /172.15.2.164 is now dead.
> DEBUG [GossipTasks:1] 2012-05-14 05:44:14,137 MessagingService.java (line 221) Resetting
pool for /172.15.2.164
> DEBUG [WRITE-/172.15.2.164] 2012-05-14 05:44:15,137 OutboundTcpConnection.java (line
211) attempting to connect to
> /172.15.2.164 DEBUG [WRITE-/172.15.2.162] 2012-05-14 05:44:17,138 OutboundTcpConnection.java
(line 211) attempting to connect to
> 
> --------
> NODE 1 LOG:
> -------
> DEBUG [MutationStage:7771] 2012-05-14 05:44:03,068 Table.java (line 387) applying mutation
of row
> 32303132303231305f677569645f66697273745f31303634323231 DEBUG [Thrift:317] 2012-05-14
05:44:03,068 StorageProxy.java (line
> 212) Write timeout java.util.concurrent.TimeoutException for one (or more) of: DEBUG
[Thrift:317] 2012-05-14 05:44:03,068 CassandraServer.java (line 648) ... timed out
> DEBUG [Thrift:317] 2012-05-14 05:44:03,069 CustomTThreadPoolServer.java (line 260) Thrift
transport error occurred during processing of message.
> org.apache.thrift.transport.TTransportException: Cannot read. Remote side has closed.
Tried to read 4 bytes, but only got 0 bytes. (This is often indicative of an internal error
on the server side. Please check your server logs.)
>        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
>        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>        at org.apache.cassandra.thrift.Dse$Processor.process(Dse.java:908)
>        at com.datastax.bdp.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:250)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> DEBUG [Thrift:317] 2012-05-14 05:44:03,070 ClientState.java (line 104) logged out: #<User
allow_all groups=[]>
> DEBUG [RequestResponseStage:7161] 2012-05-14 05:44:03,073 ResponseVerbHandler.java (line
44) Processing response on a callback from 3318177@/172.15.2.164 DEBUG [MutationStage:7772]
2012-05-14 05:44:03,079
> RowMutationVerbHandler.java (line 44) Applying RowMutation(keyspace='Performance90',
> key='32303132303231305f677569645f66697273745f3130363 4323230', modifications=[ColumnFamily(VideoMedia20120210
[file:false:31@1336999691332000,media:false:1048576@1336999691332001,])]) 
> DEBUG [MutationStage:7772] 2012-05-14 05:44:03,079 Table.java (line 387) applying mutation
of row 32303132303231305f677569645f66697273745f31303634323230
> DEBUG [MutationStage:7772] 2012-05-14 05:44:03,081 RowMutationVerbHandler.java (line
56) RowMutation(keyspace='Performance90',key='32303132303231305f677569645f66697273745f31303634323230',
modifications=[ColumnFamily(VideoMedia20120210[file:false:31@1336999691332000,media:false:1048576@1336999691332001,])])
> applied.  Sending response to 3167090@/172.15.2.162
> DEBUG [MutationStage:7773] 2012-05-14 05:44:03,084 RowMutationVerbHandler.java (line
44) Applying RowMutation(keyspace='Performance90', key='32303132303231305f677569645f66697273745f3130363
4323136', modifications=[ColumnFamily(VideoMedia20120210 [file:false:31@1336999691335000,media:false:1048576@1336999691335001,])])
> DEBUG [MutationStage:7773] 2012-05-14 05:44:03,084 Table.java (line 387) applying mutation
of row 32303132303231305f677569645f66697273745f31303634323136
> DEBUG [MutationStage:7773] 2012-05-14 05:44:03,085 RowMutationVerbHandler.java (line
56) RowMutation(keyspace='Performance90', key='32303132303231305f677569645f66697273745f31303634323136',modifications=[ColumnFamily(VideoMedia20120210
> [file:false:31@1336999691335000,media:false:1048576@1336999691335001,])])
> applied.  Sending response to 3074618@/172.15.2.164
> DEBUG [MutationStage:7774] 2012-05-14 05:44:07,026 StorageProxy.java (line 357) Adding
hint for /172.15.2.165
> DEBUG [MutationStage:7774] 2012-05-14 05:44:07,026 Table.java (line 387) applying mutation
of row 66666666666666666666666666666664
> DEBUG [MutationStage:7774] 2012-05-14 05:44:07,026 SlabAllocator.java (line 105) 1 regions
now allocated in
> org.apache.cassandra.utils.SlabAllocator@7dce40cc
> DEBUG [MutationStage:7775] 2012-05-14 05:44:07,027 StorageProxy.java (line 357) Adding
hint for /172.15.2.165
> DEBUG [MutationStage:7775] 2012-05-14 05:44:07,027 Table.java (line 387) applying mutation
of row 66666666666666666666666666666664
> DEBUG [Thread-1959] 2012-05-14 05:44:12,135 Gossiper.java (line 222) Reseting version
for /172.15.2.165
> DEBUG [Thread-1960] 2012-05-14 05:44:12,135 Gossiper.java (line 222) Reseting version
for /172.15.2.165
>  INFO [GossipTasks:1] 2012-05-14 05:44:12,626 Gossiper.java (line 818) InetAddress /172.15.2.165
is now dead.
> DEBUG [GossipTasks:1] 2012-05-14 05:44:12,626 MessagingService.java
> (line 221) Resetting pool for /172.15.2.165
>  
> Pavel Polushkin.
> <thread dump.txt>


Mime
View raw message