Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A8E7A110C9 for ; Thu, 11 Sep 2014 17:18:41 +0000 (UTC) Received: (qmail 29688 invoked by uid 500); 11 Sep 2014 17:18:38 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 29654 invoked by uid 500); 11 Sep 2014 17:18:38 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 29644 invoked by uid 99); 11 Sep 2014 17:18:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Sep 2014 17:18:38 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rcoli@eventbrite.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-la0-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Sep 2014 17:18:33 +0000 Received: by mail-la0-f44.google.com with SMTP id mc6so8354072lab.17 for ; Thu, 11 Sep 2014 10:18:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=YAroDukc+j3jrqlHC+Bqdc0ZWViYdISBKbiJqdtg0+4=; b=CuqiUHl9g9ilS1MpQqEI2cVOi99uZY5XxbQcutKdbE2CjCyYeRIMZlBBBt8XEo+gHe xDzZ4gkWRDBK4I13CEb5F0ycBL2N4q+FjMjE7jmASXQKXQ6vSeIYEp7sJVr+Luh5w4fr 1JjuWk2UkJa9VT0gppXZEMxeUq7GhXIrojsKsA96Eh/rBIrBqarbwTneuQ+xTmJv7uy+ PGUh0KhzMt2s0bNCKYidznxLY113Slswe8wThwc49jTJkGuxF+RQ0nwn7EvS647V2XLg pGrPxhfd46kBBejX3Oct8FuqNDZm3jWtmtVnt9c4EA2pKwswetAqCyl1YYtIm63iXDgO R90w== X-Gm-Message-State: ALoCoQni7IKzY60A+a78IqrmOz+EYL1U2Dd8CXwtgxB03uOc1tIVyPRNm0RWs603Y3eNmqLgtpRq MIME-Version: 1.0 X-Received: by 10.152.121.37 with SMTP id lh5mr2762880lab.43.1410455891159; Thu, 11 Sep 2014 10:18:11 -0700 (PDT) Received: by 10.114.160.212 with HTTP; Thu, 11 Sep 2014 10:18:11 -0700 (PDT) In-Reply-To: References: Date: Thu, 11 Sep 2014 10:18:11 -0700 Message-ID: Subject: Re: Detecting bitrot with incremental repair From: Robert Coli To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=089e0122797c5d42060502cd5ad2 X-Virus-Checked: Checked by ClamAV on apache.org --089e0122797c5d42060502cd5ad2 Content-Type: text/plain; charset=UTF-8 On Thu, Sep 11, 2014 at 9:44 AM, John Sumsion wrote: > jbellis talked about incremental repair, which is great, but as I > understood, repair was also somewhat responsible for detecting and > repairing bitrot on long-lived sstables. > SSTable checksums, and the checksums on individual compressed (and only compressed) partitions provide some of this functionality, at very least giving some visibility into bitrot style corruption. > If repair doesn't do it, what will? Read repair will help, but only repair is capable of providing the guarantee you need. Probably Cassandra needs partition checksums on uncompressed partitions, and then to mark a sstable un-repaired when it detects a corrupt read. =Rob --089e0122797c5d42060502cd5ad2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On T= hu, Sep 11, 2014 at 9:44 AM, John Sumsion <SumsionJG@familysearch= .org> wrote:
jbellis talked= about incremental repair, which is great, but as I understood, repair was = also somewhat responsible for detecting and repairing bitrot on long-lived = sstables.

SSTable checksums, and the ch= ecksums on individual compressed (and only compressed) partitions provide s= ome of this functionality, at very least giving some visibility into bitrot= style corruption.
=C2=A0
If repair doesn't do it, what will?

Rea= d repair will help, but only repair is capable of providing the guarantee y= ou need. Probably Cassandra needs partition checksums on uncompressed parti= tions, and then to mark a sstable un-repaired when it detects a corrupt rea= d.

=3DRob
=C2=A0
=C2=A0
<= /div>
--089e0122797c5d42060502cd5ad2--