Hi Kenneth,

Thanks for your interest to help. I had to take a decision quick because it was a production cluster. So, long story short, I let the cluster finish the decommission process before touching it. When decommissioned node left the cluster I did a rolling restart and the nodes start behaving again without errors, also auto-compaction resumed and all nodes had accumulated a lot of files to compact. Then I performed a rolling upgrade from 3.11.1 to 3.11.4 which went very smoothly.

In retrospect to answer your questions:
Was the cluster running ok before decommissioning the node?
Yes

> Why were you decommissioning the node?  
Management decision, we wanted just  to shrink the cluster.

Were you upgrading from 3.11.1 to 3.11.4?
No, that was not the initial intention. I arrived at that conclusion after I realized I stepped into this bug on the rest of the nodes.
"Prevent compaction strategies from looping indefinitely" CASSANDRA-14079

Thanks again!


On Thu, Feb 28, 2019 at 10:45 AM Kenneth Brotman <kenbrotman@yahoo.com.invalid> wrote:

Hi John,

 

Was the cluster running ok before decommissioning the node?

Why were you decommissioning the node?

Were you upgrading from 3.11.1 to 3.11.4?

 

 

From: Ioannis Zafiropoulos [mailto:johnzaf@gmail.com]
Sent: Wednesday, February 27, 2019 7:33 AM
To: user@cassandra.apache.org
Subject: Upgrade 3.11.1 to 3.11.4

 

Hi all,

 

During a decommission on a production cluster (9 nodes) we have some issues on the remaining nodes regarding compaction, and I have some questions about that:

 

One remaining node who has stopped compacting, due to some bug in 3.11.1, has received all the streaming files from the decommission node (decommissioning is still in progress for the rest of the cluster). Could I upgrade this node to 3.11.4 and restart it?

 

Some other nodes which are still receiving files appear to do very little to no auto-compaction from nodetool tpstats. Should I wait for streaming to complete or should I upgrade these nodes as well and restart them? What would happen if I bounce such a node? will the whole process of decommissioning fail?

 

Do you recommend to eventually do a rolling upgrade to 3.11.4 or choose another version?

 

Thanks in advance for your help,

John Zaf