cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Podkowinski (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12281) Gossip blocks on startup when another node is bootstrapping
Date Wed, 02 Nov 2016 13:20:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15628949#comment-15628949
] 

Stefan Podkowinski commented on CASSANDRA-12281:
------------------------------------------------

Ok, I'm going to pull out the code for running the calculation just once per KS setting and
open a new ticket for that, as this doesn't seem to be the real problem at hand here.

The attached patches will now just make sure that we don't block on incoming gossip messages
during pending range calculation.

||2.2||3.0||trunk||
|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-12281-2.2]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-12281-3.0]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-12281-trunk]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12281-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12281-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12281-trunk-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12281-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12281-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12281-trunk-testall/]|

> Gossip blocks on startup when another node is bootstrapping
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-12281
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12281
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Eric Evans
>            Assignee: Stefan Podkowinski
>         Attachments: restbase1015-a_jstack.txt
>
>
> In our cluster, normal node startup times (after a drain on shutdown) are less than 1
minute.  However, when another node in the cluster is bootstrapping, the same node startup
takes nearly 30 minutes to complete, the apparent result of gossip blocking on pending range
calculations.
> {noformat}
> $ nodetool-a tpstats
> Pool Name                    Active   Pending      Completed   Blocked  All time blocked
> MutationStage                     0         0           1840         0              
  0
> ReadStage                         0         0           2350         0              
  0
> RequestResponseStage              0         0             53         0              
  0
> ReadRepairStage                   0         0              1         0              
  0
> CounterMutationStage              0         0              0         0              
  0
> HintedHandoff                     0         0             44         0              
  0
> MiscStage                         0         0              0         0              
  0
> CompactionExecutor                3         3            395         0              
  0
> MemtableReclaimMemory             0         0             30         0              
  0
> PendingRangeCalculator            1         2             29         0              
  0
> GossipStage                       1      5602            164         0              
  0
> MigrationStage                    0         0              0         0              
  0
> MemtablePostFlush                 0         0            111         0              
  0
> ValidationExecutor                0         0              0         0              
  0
> Sampler                           0         0              0         0              
  0
> MemtableFlushWriter               0         0             30         0              
  0
> InternalResponseStage             0         0              0         0              
  0
> AntiEntropyStage                  0         0              0         0              
  0
> CacheCleanupExecutor              0         0              0         0              
  0
> Message type           Dropped
> READ                         0
> RANGE_SLICE                  0
> _TRACE                       0
> MUTATION                     0
> COUNTER_MUTATION             0
> REQUEST_RESPONSE             0
> PAGED_RANGE                  0
> READ_REPAIR                  0
> {noformat}
> A full thread dump is attached, but the relevant bit seems to be here:
> {noformat}
> [ ... ]
> "GossipStage:1" #1801 daemon prio=5 os_prio=0 tid=0x00007fe4cd54b000 nid=0xea9 waiting
on condition [0x00007fddcf883000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000004c1e922c0> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> 	at org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:174)
> 	at org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:160)
> 	at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2023)
> 	at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1682)
> 	at org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1182)
> 	at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1165)
> 	at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1128)
> 	at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:58)
> 	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> [ ... ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message