cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12015) Rebuilding from another DC should use different sources
Date Thu, 16 Jun 2016 19:55:05 GMT


Paulo Motta commented on CASSANDRA-12015:

bq. However, it does not help the original concern of this JIRA, which is to have some sort
of randomization/round-robin selection for the source replica to stream data from.

I think there are two concerns here:
a) Improve source diversity for single node rebuilds
b) For simultaneous rebuilds, divide the load more evenly across replicas.

>From my understanding a) is easily solvable by using token order instead of proximity
to pick replicas to stream from, but this does not solve b) because primary replicas from
simultaneous rebuilds might become overloaded

Maybe b) can be solved without keeping state by using a paired replica approach similar to

> Rebuilding from another DC should use different sources
> -------------------------------------------------------
>                 Key: CASSANDRA-12015
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Fabien Rousseau
> Currently, when adding a new DC (ex: DC2) and rebuilding it from an existing DC (ex:
DC1), only the closest replica is used as a "source of data".
> It works but is not optimal, because in case of an RF=3 and 3 nodes cluster, only one
node in DC1 is streaming the data to DC2. 
> To build the new DC in a reasonable time, it would be better, in that case, to stream
from multiple sources, thus distributing more evenly the load.

This message was sent by Atlassian JIRA

View raw message