cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2369) support replication decisions per-key
Date Wed, 15 Jun 2011 16:41:47 GMT


Jonathan Ellis commented on CASSANDRA-2369:

No, it's worse than that.

Let me give an example of a simple multi-node, multi-DC cluster: nodes A and M in DC1, nodes
B and N in DC2.  So node A, M, B, and N have keys in ranges (M-A], (A, M], (N, B], (B, N],

If I write a row K with NTS {DC1: 1, DC2:2}, then I know it will be on nodes M and N. So far
so good.

What if I now repair node M? It knows it has to compare its data for range (A, B] with that
data on node B, and range (B, M] with that data on node N.  So it builds a merkle tree for
each range, and requests that B and N do so as well, then they exchange trees to see if things
are in sync.

How does this change if we introduce this partitioner? M can no longer assume that keys it
has for range (A, B] should also be replicated to node M, and vice versa.  You would have
to build a separate tree for each replica, i.e. instead of just a tree for (A, B], each replica
would need to build a tree for (A, B]-that-belongs-on-M, and another tree for (A, B)-that-belongs-on-B,
and so forth for as many possible replicas as exist.

There is a similar problem on bootstrap and node movement.  Instead of asking a single replica
to stream data from the range a new node is assuming, it will have to ask _all_ replicas that
may have rows for that range to make sure it doesn't miss any.

> support replication decisions per-key
> -------------------------------------
>                 Key: CASSANDRA-2369
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0
> Currently the replicationstrategy gets a token and a keyspace with which to decide how
to place replicas.  for per-row replication this is insufficient because tokenization is lossy

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message