Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 11010181AC for ; Wed, 15 Jul 2015 11:28:10 +0000 (UTC) Received: (qmail 62552 invoked by uid 500); 15 Jul 2015 11:28:04 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 62518 invoked by uid 500); 15 Jul 2015 11:28:04 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 62504 invoked by uid 99); 15 Jul 2015 11:28:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jul 2015 11:28:04 +0000 Date: Wed, 15 Jul 2015 11:28:04 +0000 (UTC) From: "Andreas Schnitzerling (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (CASSANDRA-9812) Handle corrupted files during startup. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Andreas Schnitzerling created CASSANDRA-9812: ------------------------------------------------ Summary: Handle corrupted files during startup. Key: CASSANDRA-9812 URL: https://issues.apache.org/jira/browse/CASSANDRA-9812 Project: Cassandra Issue Type: New Feature Environment: Windows-7-32 bit, 3.2GB RAM, Java 1.7.0_55 Reporter: Andreas Schnitzerling This ticket is relying to CASSANDRA-9686 (refer for details). Here the conclusion: In our company we cannot avoid power-cut of the nodes (unexpected and for tests). We need a behavior, which keeps the nodes online even on finding corrupted files during startup. One idea was copy and scrub corrupted files. [~Stefania] wrote: {code} Yes a disk corruption due to a power cut could explain it. I don't think we should delete corrupt sstables though, but we could maybe move them somewhere else - where they wouldn't be automatically loaded. Then the scrub tool could copy the fixed version back into the right folder, but this is kind of opposite of what it does at the moment (save a backup and then fix the original). {code} This could avoid stopping the nodes and keep the cluster running. We need that behavior only on startup of the nodes, not during runtime. The only cause seems to be power-cut. The nodes are configured to start C* as a service. -- This message was sent by Atlassian JIRA (v6.3.4#6332)