cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anuj Wadehra (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10446) Run repair with down replicas
Date Wed, 20 Jan 2016 03:07:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107876#comment-15107876
] 

Anuj Wadehra edited comment on CASSANDRA-10446 at 1/20/16 3:06 AM:
-------------------------------------------------------------------

I think, this is an issue with the way we handled the scenario of a downed replica in repairs.
We should increase the priority and change the type from Improvement to Bug so that it gets
due attention.

Consider following scenario and flow of events which demonstrate the importance of this issue:

Scenario: I have a 20 node cluster, RF=5, Read/Write Quorum, gc grace period=20 days. I think
that my Cassandra cluster is fault tolerant and it can afford 2 node failures.

Suddenly, one node goes down due to some hardware issue. The failed node would prevent repair
on many nodes in the cluster as it had approximately 5/20th share of total data ..1/20 which
it owns and 4/20 which is stored as replica of data owned by other nodes. Now Its 10 days
since the node is down, most of the nodes are not being repaired and now its DECISION time
for me. I am not sure how soon the issue would be fixed may be next 2 days i.e. 8 days before
gc grace period, so I shouldn't remove node early and add the node back as it would cause
significant and unnecessary streaming due to token re-arrangement. At the same time, if I
don't remove the failed node at this time i.e. 10 days after failure (much before gc grace)
and wait for the issue to be resolved, my entire system health would be in question and it
would be a panic situation as most of the data didn't get repaired in last 10 days and gc
grace is approaching. I need sufficient time to repair all nodes before the gc grace period
ends.

What looked like a fault tolerant Cassandra cluster which can easily afford 2 node failure,
will require urgent attention and manual decision making each time a single node goes down-
just like it happened in the above scenario. 

If some replicas are down, we should allow Repair to proceed with remaining replicas. If failed
nodes comes up before gc grace period, we would run repair to fix inconsistencies. Otherwise,
we would discard failed node data and bootstrap. I think that would be a really robust fault
tolerant system.




was (Author: eanujwa):
I think, this an issue with the way we handled the "downed replica" scenario in repairs. We
should increase the priority and change the type from Improvement to Bug.

Consider following scenario and flow of events which demonstrate the importance of this issue:
Scenario: I have a 20 node clsuter, RF=5, Read/Write Quorum, gc grace period=20. My cluster
is fault tolerant and it can afford 2 node failures.

Suddenly, one node goes down due to some hardware issue. The failed node would prevent repair
on many nodes in the cluster as it has approximately 5/20th share of total data ..1/20 which
it owns and 4/20 which is stored as replica of data owned by other nodes. Now Its 10 days
since the node is down, most of the nodes are not being repaired and now its decision time.
I am not sure how soon the issue would be fixed may be next 2 days i.e. 8 days before gc grace,
so I shouldnt remove node early and add node back as it would cause significant and unnecessary
streaming due to token re-arrangement. At the same time, if I dont remove the failed node
at this time i.e. 10 days (much before gc grace), my entire system health would be in question
and it would be a panic situation as most of the data didnt get repaired in last 10 days and
gc grace is approaching. I need sufficient time to repair all nodes.
What looked like a fault tolerant Cassandra cluster which can easily afford 2 node failure,
required urgent attention and manual decision making when a single node went down. If some
replicas are down, we should allow Repair to proceed with remaining replicas. If failed nodes
comes up before gc grace period, we would run repair to fix inconsistencies and otheriwse
we would discard data and bootstrap. I think that would be a really robust fault tolerant
system.



> Run repair with down replicas
> -----------------------------
>
>                 Key: CASSANDRA-10446
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Priority: Minor
>             Fix For: 3.x
>
>
> We should have an option of running repair when replicas are down. We can call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message