cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Branson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6961) nodes should go into hibernate when join_ring is false
Date Wed, 09 Apr 2014 20:18:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964626#comment-13964626
] 

Rick Branson commented on CASSANDRA-6961:
-----------------------------------------

Very happy about this. It will make a ton of things easier from an operations perspective
(bringing up new DCs, bringing up hosts after long-ish maintenance), but also interested in
using this to potentially have dedicated coordinator nodes that are separate from storage.
We find ourselves CPU bound on more capacity-constrained and expensive storage-class hardware.
Most of this CPU time is spent on request coordination. Moving this work to cheap "diskless"
application-class hardware is much more ideal and will allow us to maximize the capacity of
our storage nodes.

> nodes should go into hibernate when join_ring is false
> ------------------------------------------------------
>
>                 Key: CASSANDRA-6961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6961
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 2.0.7, 2.1 beta2
>
>         Attachments: 6961.txt
>
>
> The impetus here is this: a node that was down for some period and comes back can serve
stale information.  We know from CASSANDRA-768 that we can't just wait for hints, and know
that tangentially related CASSANDRA-3569 prevents us from having the node in a down (from
the FD's POV) state handle streaming.
> We can *almost* set join_ring to false, then repair, and then join the ring to narrow
the window (actually, you can do this and everything succeeds because the node doesn't know
it's a member yet, which is probably a bit of a bug.)  If instead we modified this to put
the node in hibernate, like replace_address does, it could work almost like replace, except
you could run a repair (manually) while in the hibernate state, and then flip to normal when
it's done.
> This won't prevent the staleness 100%, but it will greatly reduce the chance if the node
has been down a significant amount of time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message