cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Estevez <sebastian.este...@datastax.com>
Subject Re: Post portem of a large Cassandra datacenter migration.
Date Mon, 12 Oct 2015 15:44:00 GMT
For 1 and 3 have you looked CASSANDRA-8611
<https://issues.apache.org/jira/browse/CASSANDRA-8611>

For 4, you don't need to attach a profiler to check if GC is a problem.
Just grep the system log for GCInspector.

All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>

<http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Oct 9, 2015 at 8:17 PM, Kevin Burton <burton@spinn3r.com> wrote:

> We just finished up a pretty large migration of about 30 Cassandra boxes
> to a new datacenter.
>
> We'll be migrating to about 60 boxes here in the next month so scalability
> (and being able to do so cleanly) is important.
>
> We also completed an Elasticsearch migration at the same time.  The ES
> migration worked fine. A few small problems with it doing silly things with
> relocating nodes too often but all in all it was somewhat painless.
>
> At one point we were doing 200 shard reallocations in parallel and pushing
> about 2-4Gbit...
>
> The Cassandra migration, however, was a LOT harder.
>
> One quick thing I wanted to point out - we're hiring.  So if you're a
> killer Java Devops guy drop me an email....
>
> Anyway.  Back to the story.
>
> Obviously we did a bunch of research before hand to make sure we had
> plenty of bandwidth.  This was a migration from Washington DC to Germany.
>
> Using iperf, we could consistently push about 2Gb back and forth between
> DC and Germany.  This includes TCP as we switched to using large window
> sizes.
>
> The big problem that we had, was that we could only bootstrap one node at
> a time.  The ends up taking a LOT more time because you have to keep
> checking on a node so that you can start the next one.
>
> I imagine one could write a coordinator script but we had so many problems
> with CS that it wouldn't have worked if we tried.
>
> We had 2-3 main problems.
>
> 1.  Sometimes streams would just stop and lock up.  No explanation why.
> They would just lock up and not resume.  We'd wait 10-15 minutes with no
> response.. This would require us abort and retry.  Had we updated to
> Cassandra 2.2 before hand I think the new resume support would work.
>
> 2.  Some of our keyspaces created by Thrift caused exceptions regarding
> "too few resources" when trying to bootstrap. Dropping these keyspaces
> fixed the problem.  They were just test keyspaces so it didn't matter.
>
> 3.  Because of #1, it's probably better to make sure you have 2x or more
> disk space on the remote end before you do the migration.  This way you can
> boot the same number of nodes you had before and just decommission the old
> ones quickly. (er use nodetool removenode - see below)
>
> 4.  We're not sure why, but our OLDER machines kept locking up during this
> process.  This kept requiring us to do a rolling restart on all the older
> nodes.  We suspect this is GC and we were seeing single cores to 100%.  I
> didn't have time to attach a profiler as were all burned out at this point
> and just wanted to get it over with.  This problem meant that #1 was
> exacerbated because our old boxes would either refuse to send streams or
> refuse to accept them.  It seemed to get better when we upgraded the older
> boxes to use Java 8.
>
> 5.  Don't use nodetool decommission if you have a large number of nodes.
> Instead, use nodetool removenode.  It's MUCH faster and does M-N
> replication between nodes directly.  The downside is that you go down to
> N-1 replicas during this process. However, it was easily 20-30x faster.
> This probably saved me about 5 hours of sleep!
>
> In hindsight, I'm not sure what we would have done differently.  Maybe
> bought more boxes.  Maybe upgraded to Cassandra 2.2 and probably java 8 as
> well.
>
> Setting up datacenter migration might have worked out better too.
>
> Kevin
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
>
>

Mime
View raw message