cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Onnen (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-1451) Shutting down a node "cleanly" still kills client requests when the node goes down
Date Mon, 22 Nov 2010 22:52:15 GMT


Erik Onnen commented on CASSANDRA-1451:

Here's what we observed that lead to this being discussed in IRC.

When executing nodetool drain, a node is no-longer able to accept new write operations. This
is problematic for several reasons in the current implementation:

1) The drain node actually accepts writes, just won't process them locally but it will ship
writes to remote endpoints. In 0.6.8, the write can actually be successful, even though a
timeout error is reported back to the client when the local write fails causing the client
to think the write fails when it in fact succeeded.
2) The drain node can still process some writes, just not writes for which it is a natural
endpoint. This leads to non-deterministic behavior for clients where some writes succeed,
but others fail.
3) The drain node can still process reads. This causes some upstream client libraries to think
the node is healthy when in reality it should be shunned (at least for writes).
4) When a local write is rejected, it surfaces as a timeout exception. This is the same behavior
that happens when pending read/write stage operations are full. In many cases, it's proper
for a client to retry when read/write are full but due to how this appears to the client,
the client cannot distinguish whether read/writes are backed up, or if the local node is simply
rejecting the write as a result of being in a drain. The clients can't self-help in this case,
they're left to guess which is bad.

> Shutting down a node "cleanly" still kills client requests when the node goes down
> ----------------------------------------------------------------------------------
>                 Key: CASSANDRA-1451
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6.5
>            Reporter: David King
> Shutting down a node, even more cleanly through drain, still kills some requests with
timeoutexceptions. Ideally, operations would not be sent at all to nodes that are known to
be shutting down, perhaps by shutting down gossip before starting the draining process. 
> Other nodes will still need to have the phi convict threshold exceeded, but presumably
that's usually shorter than drain

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message