I just ran nodetool drain in a 3 node cluster that was not serving any requests, the other nodes picked up the change in about 10 seconds.
On the node I drained
INFO [RMI TCP Connection(39)-192.168.34.31] 2010-09-30 15:18:03,281 StorageService.java (line 474) Starting drain process
INFO [RMI TCP Connection(39)-192.168.34.31] 2010-09-30 15:18:03,282 MessagingServicejava (line 348) Shutting down MessageService...
INFO [ACCEPT-sorb/192.168.34.31] 2010-09-30 15:18:03,289 MessagingService.java (line 529) MessagingService shutting down server thread.
INFO [RMI TCP Connection(39)-192.168.34.31] 2010-09-30 15:18:03,290 MessagingService.java (line 365) Shutdown complete (no further commands will be processed)
INFO [RMI TCP Connection(39)-192.168.34.31] 2010-09-30 15:18:03,339 StorageService.java (line 474) Node is drained
One of the others
INFO [Timer-0] 2010-09-30 15:18:12,753 Gossiper.java (line 196) InetAddress /192.168.34.31 is now dead.
DEBUG [Timer-0] 2010-09-30 15:18:12,753 MessagingService.java (line 134) Resetting pool for /192.168.34.31
Either way, I would say it's safer to drain the node first. As it writes out the SSTables and drains the log, so after the reboot the server will not need to play forward the log. This may be a good thing in the event of an issue with the upgrade.
My guess is:
- drain the node
- other nodes can still read from it, it will actively reject writes (because the Messaging Service is down). So no timeouts.
- wait until the down state of the node is propagated around the cluster, then shut it down.
I may be able to test out the theory under a light load later today or tomorrow. Anyone else have any thoughts?
On 30 Sep, 2010,at 02:54 PM, Justin Sanders <email@example.com> wrote: