On Thu, Nov 3, 2016 at 2:32 PM, Mike Torra <email@example.com> wrote:Hi Alex - I do monitor sstable counts and pending compactions, but probably not closely enough. In 3/4 regions the cluster is running in, both counts are very high - ~30-40k sstables for one particular CF, and on many nodes >1k pending compactions.It is generally a good idea to try to keep the number of pending compactions minimal. We usually see it is close to zero on every node during normal operations and less than some tens during maintenance such as repair.I had noticed this before, but I didn't have a good sense of what a "high" number for these values was.I would say anything higher than 20 probably requires someone to have a look and over 1k is very troublesome.It makes sense to me why this would cause the issues I've seen. After increasing concurrent_compactors and compaction_throughput_mb_per_s
ec (to 8 and 64mb, respectively), I'm starting to see those counts go down steadily. Hopefully that will resolve the OOM issues, but it looks like it will take a while for compactions to catch up.
Thanks for the suggestions, Alex