Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F40B617F6B for ; Thu, 29 Jan 2015 20:37:34 +0000 (UTC) Received: (qmail 62078 invoked by uid 500); 29 Jan 2015 20:37:35 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 61974 invoked by uid 500); 29 Jan 2015 20:37:35 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 61648 invoked by uid 99); 29 Jan 2015 20:37:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jan 2015 20:37:35 +0000 Date: Thu, 29 Jan 2015 20:37:34 +0000 (UTC) From: "Robert Coli (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (CASSANDRA-8703) incremental repair vs. bitrot MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Robert Coli created CASSANDRA-8703: -------------------------------------- Summary: incremental repair vs. bitrot Key: CASSANDRA-8703 URL: https://issues.apache.org/jira/browse/CASSANDRA-8703 Project: Cassandra Issue Type: Bug Reporter: Robert Coli Incremental repair is a great improvement in Cassandra, but it does not contain a feature that non-incremental repair does : protection against bitrot. Scenario : 1) repair SSTable, marking it repaired 2) cosmic ray hits hard drive, corrupting a record in SSTable 3) range is actually unrepaired as of the time that SSTable was repaired, but thinks it is repaired >From my understanding, if bitrot is detected (via eg the CRC on the read path) then all SSTables containing the corrupted range needs to be marked unrepaired on all replicas. Per marcuse@IRC, the naive/simplest response would be to just trigger a full repair in this case. I am concerned about incremental repair as an operational default while it does not handle this case. As an aside, this would also seem to require a new CRC on the uncompressed read path, as otherwise one cannot detect the corruption without periodic checksumming of SSTables. Alternately, a "nodetool checksum" function which verified table checksums, marking ranges unrepaired on failure, and which could be run every gc_grace_seconds would seem to meet the requirement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)