Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 16628 invoked from network); 8 Feb 2011 14:24:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Feb 2011 14:24:05 -0000 Received: (qmail 5327 invoked by uid 500); 8 Feb 2011 14:24:03 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 5096 invoked by uid 500); 8 Feb 2011 14:23:58 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 5087 invoked by uid 99); 8 Feb 2011 14:23:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Feb 2011 14:23:57 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of meatforums@gmail.com designates 209.85.216.172 as permitted sender) Received: from [209.85.216.172] (HELO mail-qy0-f172.google.com) (209.85.216.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Feb 2011 14:23:50 +0000 Received: by qyk34 with SMTP id 34so394139qyk.10 for ; Tue, 08 Feb 2011 06:23:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=gKaND+r2AVOp5AoATxsI5P6mmIEEUWWwfqOmtPHpeyY=; b=IL6TnFSj2q/Um2wyH3mRJnFP6Kd0o+eiZ/W1hB/I0nREZUEEV97diUfZOrX1n59+gi ekNUq/BBMgtk7wa4XOPa+p0c269NvzEtD6Ez/Q4rPEMliBpWgLOj2myTRISccaIcto+F j+un9UECho0jLW4UcSISJjXtAhLvh2C3n40CA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=hDk94KDYIiA88j2OI11rDpPQJBB1jjAK6CmXl/2kymbpxRlcex1bB49yLC9ohNIzsE PhtvUFhq2LmuQMRiIXMEN9XTbNaKX4kG511u6RYovQ5KVx1c7QxTZT2/n19YG9QAL3Vv fng171kPJhfPo1B9PFE8RsanzMyInia1scDjo= MIME-Version: 1.0 Received: by 10.229.219.132 with SMTP id hu4mr12327833qcb.60.1297175009962; Tue, 08 Feb 2011 06:23:29 -0800 (PST) Received: by 10.229.24.74 with HTTP; Tue, 8 Feb 2011 06:23:29 -0800 (PST) In-Reply-To: References: Date: Tue, 8 Feb 2011 06:23:29 -0800 Message-ID: Subject: Re: Best way to detect/fix bitrot today? From: Anand Somani To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001636284566ae2cf2049bc61754 X-Virus-Checked: Checked by ClamAV on apache.org --001636284566ae2cf2049bc61754 Content-Type: text/plain; charset=ISO-8859-1 I should have clarified we have 3 copies, so in that case as long as 2 match we should be ok? Even if there were checksumming at the SStable level, I assume it has to check and report these errors on compaction (or node repair)? I have seen some JIRA open on these issues ( 47 and 1717), but if I need something today, a read repair ( or a node repair) is the only viable option? On Mon, Feb 7, 2011 at 12:09 PM, Peter Schuller wrote: > > Our application space is such that there is data that might not be read > for > > a long time. The data is mostly immutable. How should I approach > > detecting/solving the bitrot problem? One approach is read data and let > read > > repair do the detection, but given the size of data, that does not look > very > > efficient. > > Note that read-repair is not really intended to repair arbitrary > corruptions. Unless I'm mistaken, arbitrary corruption, unless it > triggers a serialization failure that causes row skipping, it's a > toss-up which version of the data is retained (or both, if the > corruption is in the key). Given the same key and column timestamp, > the tie breaker is the volumn value. So depending on whether > corruption results in a "lesser" or "greater" value, you might get the > corrupt or non-corrupt data. > > > Has anybody solved/workaround this or has any other suggestions to detect > > and fix bitrot? > > My feel/tentative opinion is that the clean fix is for Cassandra to > support strong checksumming at the sstable level. > > Deploying on e.g. ZFS would help a lot with this, but that's a problem > for deployment on Linux (which is the recommended platform for > Cassandra). > > -- > / Peter Schuller > --001636284566ae2cf2049bc61754 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I should have clarified we have 3 copies, so in that case as long as 2 matc= h we should be ok?

Even if there were checksumming at the SStable l= evel, I assume it has to check and report these errors on compaction (or no= de repair)?

I have seen some JIRA open on these issues ( 47 and 1717), but if I nee= d something today, a read repair ( or a node repair) is the only viable opt= ion?

=A0

On Mon, Feb 7, 2011 at 1= 2:09 PM, Peter Schuller <peter.schuller@infidyne.com> wrote:
> Our app= lication space is such that there is data that might not be read for
> a long time. The data is mostly immutable. How should I approach
> detecting/solving the bitrot problem? One approach is read data and le= t read
> repair do the detection, but given the size of data, that does not loo= k very
> efficient.

Note that read-repair is not really intended to repair arbitrary
corruptions. Unless I'm mistaken, arbitrary corruption, unless it
triggers a serialization failure that causes row skipping, it's a
toss-up which version of the data is retained (or both, if the
corruption is in the key). Given the same key and column timestamp,
the tie breaker is the volumn value. So depending on whether
corruption results in a "lesser" or "greater" value, yo= u might get the
corrupt or non-corrupt data.

> Has anybody solved/workaround this or has any other suggestions to det= ect
> and fix bitrot?

My feel/tentative opinion is that the clean fix is for Cassandra to support strong checksumming at the sstable level.

Deploying on e.g. ZFS would help a lot with this, but that's a problem<= br> for deployment on Linux (which is the recommended platform for
Cassandra).

--
/ Peter Schuller

--001636284566ae2cf2049bc61754--