cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Bailey (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4756) Bulk loading snapshots creates RF^2 copies of the data
Date Tue, 09 Oct 2012 23:20:02 GMT


Nick Bailey commented on CASSANDRA-4756:

Imagine I have a cluster of 3 nodes and RF=3. This means each node has a full copy of the
data. If I snapshot one of those nodes and then load that snapshot using the bulkloader, it
will stream a full copy of the data to all nodes. So just bulk loading this one snapshot will
create RF copies of the data. Bulk loading a snapshot from the remaining two nodes will then
repeat this process, so 3 nodes streaming 3 copies of the data = Rf^2.

> Bulk loading snapshots creates RF^2 copies of the data
> ------------------------------------------------------
>                 Key: CASSANDRA-4756
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Nick Bailey
> Since a cluster snapshot will contain rf copies of each piece of data, bulkloading all
of those snapshots will create rf^2 copies of each piece of data.
> Not sure what the solution here is. Ideally we would merge the RF copies of the data
before sending to the cluster. This would solve any inconsistencies that existed when the
snapshot was taken.
> A more naive approach of only loading one of the RF copies and assuming there are no
inconsistencies might be an easier goal for the near term though.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message