Hi, Evelyn!

I've found the following messages:

INFO RepairRunnable.java Starting repair command #41, repairing keyspace XXX with repair options (parallelism: parallel, primary range: false, incremental: false, job threads: 1, ColumnFamilies: [YYY], dataCenters: [], hosts: [], # of ranges: 768)
INFO CompactionExecutor:6 CompactionManager.java Starting anticompaction for XXX.YYY on 5132/5846 sstables

After that many similar messages go:
SSTable BigTableReader(path='/mnt/cassandra/data/XXX/YYY-4c12fd9029e611e8810ac73ddacb37d1/lb-12688-big-Data.db') fully contained in range (-9223372036854775808,-9223372036854775808], mutating repairedAt instead of anticompacting

Does it means that anti-compaction is not the cause?

2018-04-05 18:01 GMT+05:00 Evelyn Smith <u5015159@gmail.com>:
It might not be what cause it here. But check your logs for anti-compactions.


On 5 Apr 2018, at 8:35 pm, Dmitry Simonov <dimmoborgir@gmail.com> wrote:

Thank you!
I'll check this out.

2018-04-05 15:00 GMT+05:00 Alexander Dejanovski <alex@thelastpickle.com>:
40 pending compactions is pretty high and you should have way less than that most of the time, otherwise it means that compaction is not keeping up with your write rate.

If you indeed have SSDs for data storage, increase your compaction throughput to 100 or 200 (depending on how the CPUs handle the load). You can experiment with compaction throughput using : nodetool setcompactionthroughput 100

You can raise the number of concurrent compactors as well and set it to a value between 4 and 6 if you have at least 8 cores and CPUs aren't overwhelmed.

I'm not sure why you ended up with only one node having 6k SSTables and not the others, but you should apply the above changes so that you can lower the number of pending compactions and see if it prevents the issue from happening again.

Cheers,


On Thu, Apr 5, 2018 at 11:33 AM Dmitry Simonov <dimmoborgir@gmail.com> wrote:
Hi, Alexander!

SizeTieredCompactionStrategy is used for all CFs in problematic keyspace.
Current compaction throughput is 16 MB/s (default value).

We always have about 40 pending and 2 active "CompactionExecutor" tasks in "tpstats".
Mostly because of another (bigger) keyspace in this cluster.
But the situation is the same on each node.

According to "nodetool compactionhistory", compactions on this CF run (sometimes several times per day, sometimes one time per day, the last run was yesterday).
We run "repair -full" regulary for this keyspace (every 24 hours on each node), because gc_grace_seconds is set to 24 hours.

Should we consider increasing compaction throughput and "concurrent_compactors" (as recommended for SSDs) to keep "CompactionExecutor" pending tasks low?

2018-04-05 14:09 GMT+05:00 Alexander Dejanovski <alex@thelastpickle.com>:
Hi Dmitry,

could you tell us which compaction strategy that table is currently using ?
Also, what is the compaction max throughput and is auto-compaction correctly enabled on that nodeĀ ?

Did you recently run repair ?

Thanks,

On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov <dimmoborgir@gmail.com> wrote:
Hello!

Could you please give some ideas on the following problem?

We have a cluster with 3 nodes, running Cassandra 2.2.11.

We've recently discovered high CPU usage on one cluster node, after some investigation we found that number of sstables for one CF on it is very big: 5800 sstables, on other nodes: 3 sstable.

Data size in this keyspace was not very big ~100-200Mb per node.

There is no such problem with other CFs of that keyspace.

nodetool compact solved the issue as a quick-fix.

But I'm wondering, what was the cause? How prevent it from repeating?

--
Best Regards,
Dmitry Simonov
--
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting



--
Best Regards,
Dmitry Simonov
--
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting



--
Best Regards,
Dmitry Simonov




--
Best Regards,
Dmitry Simonov