cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T Jake Luciani (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-3810) reconsider rack awareness
Date Wed, 02 Sep 2015 13:58:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14727379#comment-14727379
] 

T Jake Luciani edited comment on CASSANDRA-3810 at 9/2/15 1:57 PM:
-------------------------------------------------------------------

First off I don't think we should remove rack awareness. 

I think there are a couple options here:
   - Provide information on rack replication placement via something like CASSANDRA-9667.
 If we can make it clear before hand what the state of the cluster will look like before 
adding/removing nodes we can avoid obvious mistakes that are compounded when adding many nodes
at once.

   - Make rack awareness and replica placement controllable by applications via the NTS, so
replication factors can be crated per DC and RACK.  This would require CASSANDRA-8119


was (Author: tjake):
First off I don't think we should remove rack awareness. 

I think there are a couple options here:
   - Provide information on rack replication placement via something like CASSANDRA-9667.
 If we can make it clear before hand what the state of the cluster will look like before or
after adding/removing nodes we can avoid obvious mistakes that are compounded when adding
many nodes at once.

   - Make rack awareness and replica placement controllable by applications via the NTS, so
replication factors can be crated per DC and RACK.  This would require CASSANDRA-8119

> reconsider rack awareness
> -------------------------
>
>                 Key: CASSANDRA-3810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3810
>             Project: Cassandra
>          Issue Type: Task
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>
> We believed we wanted to be rack aware because we want to ensure that loosing a rack
only affects a single replica of any given row key.
> When using rack awareness, the first problem you encounter immediately if you aren't
careful is that you induce hotspots as a result of rack aware replica selection. Using the
format {{rackname-nodename}}, consider a part of the ring that looks like this:
> {code}
> ...
> r1-n1
> r1-n2
> r1-n3
> r2-n1
> r3-n1
> r4-n1
> ...
> {code}
> Due to the rack awareness, {{r2-n1}} will be the second replica for all data whose primary
replica is on {{r1-n1}}, {{r1-n2}} and {{r1-n3}} since they would all be forced to skip over
any identical racks.
> The way we end up allocating nodes in a cluster is to satisfy this criteria:
> * Any node in rack {{r}} in a cluster of a replication factor of {{rf}}, must not have
another node in {{r}} within {{rf-1}} steps in the ring in either direction.
> Any violation of this criteria implies the induction of hotspots due to rack awareness.
> The realization however, that I had a few days ago, is that *the rackawareness is not
actually changing replica placement* when using this ring topology. In other words, *the way
you have to use* rack awareness is to construct the ring such that *the rack awareness is
a NOOP*.
> So, questions:
> * Is there any non-hotspot inducing use-case where rack awareness can be used ("used"
in the sense that it actually changes the placement relative to non-awareness) effectively
without satisfying the criteria above?
> * Is it misleading and counter-productive to teach people (via documentation for example)
to rely on rack awareness in their rings instead of just giving them the rule above for ring
topology?
> * Would it be a better service to the user to provide an easy way to *ensure* that the
ring topology adheres to this criteria (such as refusing to bootstrap a new node if rack awareness
is requested, and taking it into consideration on automatic token selection (does anyone use
that?)), than to "silently" generate hotspots by altering the replication strategy? (The "silence"
problem is magnified by the fact that {{nodetool ring}} doesn't reflect this; so the user
must take into account both the RF *and* the racks when interpreting {{nodetool ring}} output.)
> FWIW, internally we just go with the criteria outlined above, and we have a separate
tool which will print the *actual* ownership percentage of a node in the ring (based on the
thrift {{describe_ring}} call). Any ring that has node selections that causes a violation
of the criteria is effectively a bug/mis-configured ring, so only in the event of mistakes
are we "using" the rack awareness (using the definition of "use" above).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message