cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4756) Bulk loading snapshots creates RF^2 copies of the data
Date Wed, 10 Oct 2012 16:15:03 GMT


Jonathan Ellis commented on CASSANDRA-4756:

Well, let me give a trivial example addressing both of those.

Suppose I have 3 nodes, A C D, and RF=2.  Row 1 is replicated to A and C.  We add node B,
then bulk load snapshots with my scheme.

The snapshot of A with --one-copy=0 would send that row to A.  The snapshot of C with --one-copy=1
would send that row to B.
> Bulk loading snapshots creates RF^2 copies of the data
> ------------------------------------------------------
>                 Key: CASSANDRA-4756
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Nick Bailey
> Since a cluster snapshot will contain rf copies of each piece of data, bulkloading all
of those snapshots will create rf^2 copies of each piece of data.
> Not sure what the solution here is. Ideally we would merge the RF copies of the data
before sending to the cluster. This would solve any inconsistencies that existed when the
snapshot was taken.
> A more naive approach of only loading one of the RF copies and assuming there are no
inconsistencies might be an easier goal for the near term though.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message