cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleg Kibirev (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-5456) Large number of bootstrapping nodes cause gossip to stop working
Date Thu, 11 Apr 2013 20:09:16 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Oleg Kibirev updated CASSANDRA-5456:
------------------------------------

    Attachment: PendingRangeCalculatorService.patch

Making a copy of bootstrapTokens before a time consuming loop rather than holding a synchronized
lock for the whole duration
                
> Large number of bootstrapping nodes cause gossip to stop working
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-5456
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5456
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.10
>            Reporter: Oleg Kibirev
>         Attachments: PendingRangeCalculatorService.patch
>
>
> Long running section of code in PendingRangeCalculatorService is synchronized on bootstrapTokens.
This causes gossip to stop working as it waits for the same lock when a large number of nodes
(hundreds in our case) are bootstrapping. Consequently, the whole cluster becomes non-functional.

> I experimented with the following change in PendingRangeCalculatorService.java and it
resolved the problem in our case. Prior code had synchronized around the for loop.
> synchronized(bootstrapTokens) {
>     bootstrapTokens = new LinkedHashMap<Token, InetAddress>(bootstrapTokens);
> }
> for (Map.Entry<Token, InetAddress> entry : bootstrapTokens.entrySet())
> {
>    InetAddress endpoint = entry.getValue();
>    allLeftMetadata.updateNormalToken(entry.getKey(), endpoint);
>    for (Range<Token> range : strategy.getAddressRanges(allLeftMetadata).get(endpoint))
>    pendingRanges.put(range, endpoint);
>    allLeftMetadata.removeEndpoint(endpoint);
> }
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message