Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C8E44993E for ; Tue, 20 Dec 2011 12:19:56 +0000 (UTC) Received: (qmail 33508 invoked by uid 500); 20 Dec 2011 12:19:56 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 33483 invoked by uid 500); 20 Dec 2011 12:19:56 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 33475 invoked by uid 99); 20 Dec 2011 12:19:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Dec 2011 12:19:56 +0000 X-ASF-Spam-Status: No, hits=-2002.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Dec 2011 12:19:54 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 6D8BF11DEDA for ; Tue, 20 Dec 2011 12:19:32 +0000 (UTC) Date: Tue, 20 Dec 2011 12:19:32 +0000 (UTC) From: "Sylvain Lebresne (Commented) (JIRA)" To: commits@cassandra.apache.org Message-ID: <1341796140.30501.1324383572450.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <189759327.16831.1323979890602.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (CASSANDRA-3641) inconsistent/corrupt counters w/ broken shards never converge MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173138#comment-13173138 ] Sylvain Lebresne commented on CASSANDRA-3641: --------------------------------------------- Let's open a separate ticket to discuss that. So far we've use the log only for recording errors so let's keep it at that for this ticket. > inconsistent/corrupt counters w/ broken shards never converge > ------------------------------------------------------------- > > Key: CASSANDRA-3641 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3641 > Project: Cassandra > Issue Type: Bug > Reporter: Peter Schuller > Assignee: Peter Schuller > Attachments: 3641-0.8-internal-not-for-inclusion.txt, 3641-trunk.txt > > > We ran into a case (which MIGHT be related to CASSANDRA-3070) whereby we had counters that were corrupt (hopefully due to CASSANDRA-3178). The corruption was that there would exist shards with the *same* node_id, *same* clock id, but *different* counts. > The counter column diffing and reconciliation code assumes that this never happens, and ignores the count. The problem with this is that if there is an inconsistency, the result of a reconciliation will depend on the order of the shards. > In our case for example, we would see the value of the counter randomly fluctuating on a CL.ALL read, but we would get consistent (whatever the node had) on CL.ONE (submitted to one of the nodes in the replica set for the key). > In addition, read repair would not work despite digest mismatches because the diffing algorithm also did not care about the counts when determining the differences to send. > I'm attaching patches that fixes this. The first patch is against our 0.8 branch, which is not terribly useful to people, but I include it because it is the well-tested version that we have used on the production cluster which was subject to this corruption. > The other patch is against trunk, and contains the same change. > What the patch does is: > * On diffing, treat as DISJOINT if there is a count discrepancy. > * On reconciliation, look at the count and *deterministically* pick the higher one, and: > ** log the fact that we detected a corrupt counter > ** increment a JMX observable counter for monitoring purposes > A cluster which is subject to such corruption and has this patch, will fix itself with and AES + compact (or just repeated compactions assuming the replicate-on-compact is able to deliver correctly). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira