Thanks for your interest to help. I had to take a decision quick because it was a production cluster. So, long story short, I let the cluster finish the decommission process before touching it. When decommissioned node left the cluster I did a rolling restart and the nodes start behaving again without errors, also auto-compaction resumed and all nodes had accumulated a lot of files to compact. Then I performed a rolling upgrade from 3.11.1 to 3.11.4 which went very smoothly.

Was the cluster running ok before decommissioning the node?

Management decision, we wanted just  to shrink the cluster.

Were you upgrading from 3.11.1 to 3.11.4?
No, that was not the initial intention. I arrived at that conclusion after I realized I stepped into this bug on the rest of the nodes.
"Prevent compaction strategies from looping indefinitely" CASSANDRA-14079

During a decommission on a production cluster (9 nodes) we have some issues on the remaining nodes regarding compaction, and I have some questions about that:


One remaining node who has stopped compacting, due to some bug in 3.11.1, has received all the streaming files from the decommission node (decommissioning is still in progress for the rest of the cluster). Could I upgrade this node to 3.11.4 and restart it?


Some other nodes which are still receiving files appear to do very little to no auto-compaction from nodetool tpstats. Should I wait for streaming to complete or should I upgrade these nodes as well and restart them? What would happen if I bounce such a node? will the whole process of decommissioning fail?


Do you recommend to eventually do a rolling upgrade to 3.11.4 or choose another version?


