Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Wed, 30 Mar 2016 23:13:25 +0000 (UTC)
From: "Nick Bailey (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12954588.1459334926000.94912.1459379605480@Atlassian.JIRA>
In-Reply-To: <JIRA.12954588.1459334926000@Atlassian.JIRA>
References: <JIRA.12954588.1459334926000@Atlassian.JIRA>
 <JIRA.12954588.1459334926975@arcas>
Subject: [jira] [Commented] (CASSANDRA-11461) Failed incremental repairs
 never cleared from pending list
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219041#comment-15219041 ] 

Nick Bailey commented on CASSANDRA-11461:
-----------------------------------------

Yeah. So OpsCenter lets you configure some tables for incremental repair and some for normal subrange repair, which is what was happening in this case. So OpsCenter is doing:

* Break up the ring into small chunks for subrange repair
* Visit a node and repair a small range for all tables that are using subrange repair
* If any tables are configured for incremental repair, run an incremental repair on those tables
** By default this would do a full incremental repair on those tables, which is what was in use when this bug was hit
* Jump across the ring to a different node and repeat the above process.

It does all this in a single datacenter, since opscenter does cross dc repair.

That's at least the very high level overview.

> Failed incremental repairs never cleared from pending list
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-11461
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11461
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Adam Hattrell
>
> Set up a test cluster with 2 DC's, heavy use of LCS (not sure if that's relevant).
> Kick off cassandra-stress against it.
> Kick of an automated incremental repair cycle.  
> After a bit a node starts flapping which causes a few repairs to fail.  This is never cleared out of pending repairs - given the keyspace is replicated to all nodes it means they all have pending repairs that will never complete.  Repairs  are basically blocked at this point.
> Given we're using Incremental repairs you're now spammed with:
> "Cannot start multiple repair sessions over the same sstables"
> Cluster and logs are still available for review - message me for details.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)