incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: Marking each node down before rolling restart
Date Thu, 30 Sep 2010 02:40:20 GMT
I just ran nodetool drain in a 3 node cluster that was not serving any requests, the other
nodes picked up the change in about 10 seconds.

On the node I drained 
 INFO [RMI TCP Connection(39)-192.168.34.31] 2010-09-30 15:18:03,281 StorageService.java
(line 474) Starting drain process
 INFO [RMI TCP Connection(39)-192.168.34.31] 2010-09-30 15:18:03,282 MessagingService.java
(line 348) Shutting down MessageService...
 INFO [ACCEPT-sorb/192.168.34.31] 2010-09-30 15:18:03,289 MessagingService.java (line 529)
MessagingService shutting down server thread.
 INFO [RMI TCP Connection(39)-192.168.34.31] 2010-09-30 15:18:03,290 MessagingService.java
(line 365) Shutdown complete (no further commands will be processed)
 INFO [RMI TCP Connection(39)-192.168.34.31] 2010-09-30 15:18:03,339 StorageService.java
(line 474) Node is drained

One  of the others
 INFO [Timer-0] 2010-09-30 15:18:12,753 Gossiper.java (line 196) InetAddress /192.168.34.31
is now dead.
DEBUG [Timer-0] 2010-09-30 15:18:12,753 MessagingService.java (line 134) Resetting pool for
/192.168.34.31

Either way, I would say it's safer to drain the node first. As it writes out the SSTables
and drains the log, so after the reboot the server will not need to play forward the log.
This may be a good thing in the event of an issue with the upgrade. 

My guess is:
- drain the node
- other nodes can still read from it, it will actively reject writes (because the Messaging Service
is down). So no timeouts.
- wait until the down state of the node is propagated around the cluster, then shut it down. 
 
I may be able to test out the theory under a light load later today or tomorrow. Anyone else
have any thoughts?

Aaron


On 30 Sep, 2010,at 02:54 PM, Justin Sanders <justin@bronto.com> wrote:

It seems to be about 15 seconds after killing a node before the other nodes report it being
down.  

We are running a 9 node cluster with RF=3, all reads and writes at quorum.  I was making
the same assumption you are, that an operation would complete fine at quorum with only one
node down since the other two nodes would be able to respond.

Justin


On Wed, Sep 29, 2010 at 5:58 PM, Aaron Morton <aaron@thelastpickle.com> wrote:
Ah, that was not exactly what you were after. I do not know how long it takes gossip / failure
detector to detect a down node. 

In your case what is the CF you're using for reads and what is your RF? The hope would be
that taking one node down at a time would leave enough server running to serve the request. AFAIK
the coordinator will make a read request to the first node responsible for the row, and only
ask for a digest  from the others. So there may be a case where it has to timeout reading
from the first node before asking for the full data from the others.

A hack solution may be to reduce the rpc_timeout_in_ms

May need some adult supervision to answer this one. 

Aaron


On 30 Sep, 2010,at 10:45 AM, Aaron Morton <aaron@thelastpickle.com> wrote:

Try nodetool drain 

Flushes all memtables for a node and causes the node to stop accepting write operations. Read
operations will continue to work. This is typically used before upgrading a node to a new
version of Cassandra.
http://www.riptano.com/docs/0.6.5/utils/nodetool

Aaron


On 30 Sep, 2010,at 10:15 AM, Justin Sanders <justin@justinjas.com> wrote:

I looked through the documentation but couldn't find anything.  I was wondering if there
is a way to manually mark a node "down" in the cluster instead of killing the cassandra process
and letting the other nodes figure out the node is no longer up.

The reason I ask is because we are having an issue when we perform rolling restarts on the
cluster.  Basically read requests that come in on other nodes will block while they are waiting
on the node that was just killed to be marked down.  Before they realize the node is offline
they will throw a TimedOutException.

If I could mark the node being down ahead of time this timeout period could be avoided.  Any
help is appreciated.

Justin


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message