cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <jji...@gmail.com>
Subject Re: Modify keyspace replication strategy and rebalance the nodes
Date Mon, 18 Sep 2017 22:03:39 GMT
No worries, that makes both of us, my first contribution to this thread was
similarly going-too-fast and trying to remember things I don't use often (I
thought originally SimpleStrategy would consult the EC2 snitch, but it
doesn't).

- Jeff

On Mon, Sep 18, 2017 at 1:56 PM, Jon Haddad <jonathan.haddad@gmail.com>
wrote:

> Sorry, you’re right.  This is what happens when you try to do two things
> at once.  Google too quickly, look like an idiot.  Thanks for the
> correction.
>
>
> On Sep 18, 2017, at 1:37 PM, Jeff Jirsa <jjirsa@gmail.com> wrote:
>
> For what its worth, the problem isn't the snitch it's the replication
> strategy - he's using the right snitch but SimpleStrategy ignores it
>
> That's the same reason that adding a new DC doesn't work - the relocation
> strategy is dc agnostic and changing it safely IS the problem
>
>
>
> --
> Jeff Jirsa
>
>
> On Sep 18, 2017, at 11:46 AM, Jon Haddad <jonathan.haddad@gmail.com>
> wrote:
>
> For those of you who like trivia, simpleSnitch is hard coded to report
> every node in DC in “datacenter1” and in rack “rack1”, there’s no way
> around it.  https://github.com/apache/cassandra/blob/
> 8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/
> apache/cassandra/locator/SimpleSnitch.java#L28
>
> I would do this by setting up a new DC, trying to do it with the existing
> one is going to leave you in a state where most queries will return
> incorrect results (2/3 of queries at ONE and 1/2 of queries at QUORUM)
> until you finish repair.
>
> On Sep 18, 2017, at 11:41 AM, Jeff Jirsa <jjirsa@gmail.com> wrote:
>
> The hard part here is nobody's going to be able to tell you exactly what's
> involved in fixing this because nobody sees your ring
>
> And since you're using vnodes and have a nontrivial number of instances,
> sharing that ring (and doing anything actionable with it) is nontrivial.
>
> If you weren't using vnodes, you could just fix the distribution and decom
> extra nodes afterward.
>
> I thought - but don't have time or energy to check - that the ec2snitch
> would be rack aware even when using simple strategy - if that's not the
> case (as you seem to indicate), then you're in a weird spot - you can't go
> to NTS trivially because doing so will reassign your replicas to be rack/as
> aware, certainly violating your consistency guarantees.
>
> If you can change your app to temporarily write with ALL and read with
> ALL, and then run repair, then immediately ALTER the keyspace, then run
> repair again, then drop back to whatever consistency you're using, you can
> probably get through it. The challenge is that ALL gets painful if you lose
> any instance.
>
> But please test in a lab, and note that this is inherently dangerous, I'm
> not advising you to do it, though I do believe it can be made to work.
>
>
>
>
>
> --
> Jeff Jirsa
>
>
> On Sep 18, 2017, at 11:18 AM, Dominik Petrovic <dominik.petrovic@mail.ru.
> INVALID> wrote:
>
> @jeff what do you think is the best approach here to fix this problem?
> Thank you all for helping me.
>
>
> Thursday, September 14, 2017 3:28 PM -07:00 from kurt greaves <
> kurt@instaclustr.com>:
>
> Sorry that only applies our you're using NTS. You're right that simple
> strategy won't work very well in this case. To migrate you'll likely need
> to do a DC migration to ensuite no downtime, as replica placement will
> change even if RF stays the same.
>
> On 15 Sep. 2017 08:26, "kurt greaves" <kurt@instaclustr.com> wrote:
>
> If you have racks configured and lose nodes you should replace the node
> with one from the same rack. You then need to repair, and definitely don't
> decommission until you do.
>
> Also 40 nodes with 256 vnodes is not a fun time for repair.
>
> On 15 Sep. 2017 03:36, "Dominik Petrovic" <dominik.petrovic@mail.ru.invalid>
> wrote:
>
> @jeff,
> I'm using 3 availability zones, during the life of the cluster we lost
> nodes, retired others and we end up having some of the data
> written/replicated on a single availability zone. We saw it with nodetool
> getendpoints.
> Regards
>
>
> Thursday, September 14, 2017 9:23 AM -07:00 from Jeff Jirsa <
> jjirsa@gmail.com>:
>
> With one datacenter/region, what did you discover in an outage you think
> you'll solve with network topology strategy? It should be equivalent for a
> single D.C.
>
> --
> Jeff Jirsa
>
>
> On Sep 14, 2017, at 8:47 AM, Dominik Petrovic <dominik.petrovic@mail.ru.
> INVALID> wrote:
>
> Thank you for the replies!
>
> @jeff my current cluster details are:
> 1 datacenter
> 40 nodes, with vnodes=256
> RF=3
> What is your advice? is it a production cluster, so I need to be very
> careful about it.
> Regards
>
>
> Thu, 14 Sep 2017 -2:47:52 -0700 from Jeff Jirsa <jjirsa@gmail.com>:
>
> The token distribution isn't going to change - the way Cassandra maps
> replicas will change.
>
> How many data centers/regions will you have when you're done? What's your
> RF now? You definitely need to run repair before you ALTER, but you've got
> a bit of a race here between the repairs and the ALTER, which you MAY be
> able to work around if we know more about your cluster.
>
> How many nodes
> How many regions
> How many replicas per region when you're done?
>
>
>
>
> --
> Jeff Jirsa
>
>
> On Sep 13, 2017, at 2:04 PM, Dominik Petrovic <dominik.petrovic@mail.ru.
> INVALID> wrote:
>
> Dear community,
> I'd like to receive additional info on how to modify a keyspace
> replication strategy.
>
> My Cassandra cluster is on AWS, Cassandra 2.1.15 using vnodes, the
> cluster's snitch is configured to Ec2Snitch, but the keyspace the
> developers created has replication class SimpleStrategy = 3.
>
> During an outage last week we realized the discrepancy between the
> configuration and we would now fix the issue using NetworkTopologyStrategy.
>
> What are the suggested steps to perform?
> For Cassandra 2.1 I found only this doc: http://docs.datastax.com/
> en/cassandra/2.1/cassandra/operations/opsChangeKSStrategy.html
> that does not mention anything about repairing the cluster
>
> For Cassandra 3 I found this other doc: https://docs.datastax.
> com/en/cassandra/3.0/cassandra/operations/opsChangeKSStrategy.html
> That involves also the cluster repair operation.
>
> On a test cluster I tried the steps for Cassandra 2.1 but the token
> distribution in the ring didn't change so I'm assuming that wasn't the
> right think to do.
> I also perform a nodetool repair -pr but nothing changed as well.
> Some advice?
>
> --
> Dominik Petrovic
>
>
>
> --
> Dominik Petrovic
>
>
>
> --
> Dominik Petrovic
>
>
>
> --
> Dominik Petrovic
>
>
>
>

Mime
View raw message