cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Burroughs (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-6244) calculatePendingRanges could be asynchronous on 1.2 too
Date Sat, 09 Nov 2013 05:23:19 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Burroughs updated CASSANDRA-6244:
---------------------------------------

    Attachment: PRC.png

I restarted a ~60 node cluster with this patch and slapped a Timer on calculatePendingRanges.
Units are microseconds, calculations taking significantly longer than RING_DELAY occur on
all nodes.

During the rolling restart upgraded nodes were fine (success!) while the last ones to be upgraded
(and had been exposed to all of the restarts) were effectively dead with 25k pending Gossip
Tasks. 

> calculatePendingRanges could be asynchronous on 1.2 too
> -------------------------------------------------------
>
>                 Key: CASSANDRA-6244
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6244
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Cassandra 1.2, AWS
>            Reporter: Ryan Fowler
>            Assignee: Ryan Fowler
>             Fix For: 1.2.12, 2.0.3
>
>         Attachments: 6244.txt, CASSANDRA-6244-cassandra-2.0.txt, PRC.png, escalating-phi.txt
>
>
> calculatePendingRanges can hang up the Gossip thread to the point of a node marking all
the other nodes down.
> I noticed that the same problem was resolved with CASSANDRA-5135, so I attempted to port
the patch from that issue to the 1.2 codebase.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message