cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Stupp <>
Subject Re: Compaction before Decommission and Bootstrapping
Date Sun, 17 Aug 2014 15:17:52 GMT
In a few words:
Bootstrap one node at once
Wait for bootstrap to complete
Next node

More details: (C* 2.0)

Before decommissioning: nodetool cleanup

Don't forget to do repairs (one node at a time) - this should be a regular admin task

Sent from my iPhone 

> Am 17.08.2014 um 15:46 schrieb Maxime <>:
> Is there some unwritten wisdom with regards to the use 'nodetool compact' before bootstrapping
new nodes and decommissioning old ones?
> TL;DR:
> I've been spending the last few days trying to move a cluster on DigitalOcean 2GB machines
to 4GB machines (same provider). To do so I wanted to create the new nodes, bootstrap them,
then decommission the old ones (one by one seems to be the only available option).
> The bootstrapping was failing, eventually I figured out it was somehow related to the
TombstoneOverwhelmingException on the new nodes. I issued a 'nodetool compact' on the entire
cluster to try to minimize the number of Tombstones. Once that was done I was able to bootstrap
all my new nodes.
> Now is the time to decommission. From the very first node I tried to decommission I've
been getting 1 node dying after an almost endless loop of "GC for ConcurrentMarkSweep" showing
the heap getting fuller and fuller until the node dies. On one node I've been able to bump
the MAX_HEAP_SIZE by 400MB and get it to work (it was a 4GB node), but now I'm getting the
same symptoms on a 2GB node where the heap is as big as it can be before the OS running out
of RAM itself, so I can't expand the MAX_HEAP_SIZE. It would seem I have really painted myself
into a scrap-the-cluster kind of way.
> Not knowing the inner-workings of Cassandra's bootstrap and decommission mechanisms means
all I can do is make an educated guesses that perhaps doing another 'nodetool compact' on
the nodes I'm about to decommission might help. However I have not found any wisdom or documentation
on anything relating to this, which I find surprising as I can't be the first to have had
this problem.
> Does anyone have a real-world production process for efficiently and reliably bootstrap
and decommission nodes in a cluster? Seems it might look like <compact all>, <bootstrap
one-by-one>, <compact all>, <decommission one-by-one (really?!?)>. Or are all
my problems due to me running on "hardware" that doesn't have resources (RAM,CPU) to spare
in the first place?
> Thanks

View raw message