cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mariusz Gronczewski (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5154) Gossip sends removed node which causes restarted nodes to constantly create new threads
Date Fri, 25 Jan 2013 16:53:13 GMT


Mariusz Gronczewski commented on CASSANDRA-5154:

Rolling restart with removing system/LocationInfo fixed it on our production environment
> Gossip sends removed node which causes restarted nodes to constantly create new threads
> ---------------------------------------------------------------------------------------
>                 Key: CASSANDRA-5154
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.7
>         Environment: centos 6, JVM 1.6.0_37
>            Reporter: Mariusz Gronczewski
> Our cassandra cluster had 14 nodes but it was mostly idle so about 2 weeks ago we removed
3 of them (via standard decommision) & moved tokens to balance load.
> Since then no node was restarted but last week after restarting 2 of them we observed
that both of them spawn threads ( WRITE-/ where is one of removed nodes IPs
) till they hit limit ( which is 800 on our system) and then cassandra dies. Not restarted
nodes do not do that. There are no outgoing connections to those dead nodes
> I noticed dead nodes are still in nodetool gossipinfo on non-restarted nodes but not
on restarted ones so it seems they are not propertly removed from gossip.
> Would rolling restart work for fixing this  or is full cluster stop-start required ?
> trace from hanging threads:
> {code}
>  "WRITE-/" daemon prio=10 tid=0x00007f5fe8194000 nid=0x2fb2 waiting on
> condition [0x00007f6020de0000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for <0x00000007536a1160> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.park(
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(
> 	at java.util.concurrent.LinkedBlockingQueue.take(
> 	at
> {code}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message