If what you need is a replacement node, to increase the hardware specs I'd recommend an 'immediate node replacement' like described here: http://mrcalonso.com/cassandra-instantaneous-in-place-node-replacement/ 

Basically the process just rsyncs the relevant data (data + configuration) from one node to another and stop the old node and start the new one. As the configuration is all the same (just the ip will change) it joins the ring as if it was the old one. And there's no need for any bootstrapping.

BTW, are you using vnodes?


Carlos Alonso | Software Engineer | @calonso

On 3 November 2016 at 15:46, Oleksandr Shulgin <oleksandr.shulgin@zalando.de> wrote:
On Thu, Nov 3, 2016 at 2:32 PM, Mike Torra <mtorra@demandware.com> wrote:
Hi Alex - I do monitor sstable counts and pending compactions, but probably not closely enough. In 3/4 regions the cluster is running in, both counts are very high - ~30-40k sstables for one particular CF, and on many nodes >1k pending compactions.

It is generally a good idea to try to keep the number of pending compactions minimal.  We usually see it is close to zero on every node during normal operations and less than some tens during maintenance such as repair.

I had noticed this before, but I didn't have a good sense of what a "high" number for these values was.

I would say anything higher than 20 probably requires someone to have a look and over 1k is very troublesome.

It makes sense to me why this would cause the issues I've seen. After increasing concurrent_compactors and compaction_throughput_mb_per_sec (to 8 and 64mb, respectively), I'm starting to see those counts go down steadily. Hopefully that will resolve the OOM issues, but it looks like it will take a while for compactions to catch up.

Thanks for the suggestions, Alex

Welcome. :-)