cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thakrar, Jayesh" <>
Subject Re: Question upon gracefully restarting c* node(s)
Date Wed, 10 Jan 2018 14:20:55 GMT
Just curious - aside from the "sleep", is this all not part of the shutdown command?
Is this an "opportunity" to improve C*?
Having worked with RDBMSes, Hadoop and HBase, stopping communication, flushing memcache (HBase),
and relinquishing ownership of data (HBase) is all part of the shutdown process.

From: Alain RODRIGUEZ <>
Date: Wednesday, January 10, 2018 at 6:19 AM
To: "user" <>
Subject: Re: Question upon gracefully restarting c* node(s)

I agree with comments above. Cassandra is robust, and we are just talking about optimising
the process. Nothing mandatory. Going to an extreme I would say you can pull and plug back
the node power cable and call it a restart, It should not harm if your cluster is properly
tuned. Yet optimisation are welcomed as they improve entropy, starting time. Plus we are civilized
operators, not barbarians, aren't we ;-)? It's just more 'clean' and efficient.
Also, historically, it was mandatory to drain when using counter to prevent over-count as
counter are not idempotent. Not sure about this nowadays).

Last time I asked this very question I ended up building this command that I have been using
since then:

`date && nodetool disablebinary && nodetool disablegossip && sleep
10 && nodetool flush && nodetool drain && sleep 10 && sudo
service cassandra restart`

It does the following:

- Print the date for the record
- Stop all clients transports. I never heard about a benefice of shutting down the gossip
protocol, and so never did so, it might be better but I can't really say. This way we stop
listening for clients.
- After a small while no clients are using the node, calling the drain flushes memtables and
recycle commitlog as Kurt detailed above. Here I add a 'flush' because I haven't been that
lucky in the past with drain, sometimes not working at all, sometimes not cleaning commitlogs.
I believe flushing first makes this restart command more robust.
- Finally restart the service.

I think there is not only one good way to do this. Also, doing it wrong is often not such
a big deal.

Alain Rodriguez - @arodream -<>
France / Spain

The Last Pickle - Apache Cassandra Consulting

2018-01-08 3:33 GMT+00:00 Jeff Jirsa <<>>:
The sequence does have some objective benefits - especially stopping transports and then gossip,
it tells everything you’re going offline before you do, so requests won’t get dropped
or have to speculate to other replicas.

Jeff Jirsa

On Jan 7, 2018, at 7:22 PM, kurt greaves <<>>
None are essential. Cassandra will gracefully shutdown in any scenario as long as it's not
killed with a SIGKILL. However, drain does have a few benefits over just a normal shutdown.
It will stop a few extra services (batchlog, compactions) and importantly it will also force
recycling of dirty commitlog segments, meaning there will be less commitlog files to replay
on startup and reducing startup time.

A comment in the code for drain also indicates that it will wait for in-progress streaming
to complete, but I haven't managed to find 1. where this occurs, or 2. if it actually differs
to a normal shutdown. Note that this is all w.r.t 2.1. In 3.0.10 and 3.10 drain and shutdown
more or less do the exact same thing, however drain will log some extra messages.

On 2 January 2018 at 07:07, Jing Meng <<>>
Hi all.

Recently we made a change to our production env c* cluster (2.1.18) - placing the commit log
to the same SSD where data is stored, which needs restarting all nodes.

Before restarting a cassandra node, we ran the following nodetool utils:
$ nodetool disablethrift && sleep 5
$ nodetool disablebinary && sleep 5
$ nodetool disable gossip && sleep 5
$ nodetool drain && sleep 5

It was "graceful" as expected (no significant errors found), but the process is still a myth
to us: are those commands used above "sufficient", and/or why? The offical doc (<>)
did not help with this operation detail, though "nodetool drain" is apparently essential.

View raw message