Return-Path: Delivered-To: apmail-incubator-cassandra-commits-archive@minotaur.apache.org Received: (qmail 57008 invoked from network); 23 Nov 2009 21:48:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Nov 2009 21:48:02 -0000 Received: (qmail 51492 invoked by uid 500); 23 Nov 2009 21:48:02 -0000 Delivered-To: apmail-incubator-cassandra-commits-archive@incubator.apache.org Received: (qmail 51421 invoked by uid 500); 23 Nov 2009 21:48:02 -0000 Mailing-List: contact cassandra-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-dev@incubator.apache.org Delivered-To: mailing list cassandra-commits@incubator.apache.org Received: (qmail 51406 invoked by uid 99); 23 Nov 2009 21:48:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Nov 2009 21:48:02 +0000 X-ASF-Spam-Status: No, hits=-10.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Nov 2009 21:47:59 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id A6A29234C04C for ; Mon, 23 Nov 2009 13:47:39 -0800 (PST) Message-ID: <521376102.1259012859670.JavaMail.jira@brutus> Date: Mon, 23 Nov 2009 21:47:39 +0000 (UTC) From: "Stu Hood (JIRA)" To: cassandra-commits@incubator.apache.org Subject: [jira] Commented: (CASSANDRA-193) Proactive repair In-Reply-To: <761101744.1242935325628.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781641#action_12781641 ] Stu Hood commented on CASSANDRA-193: ------------------------------------ > see whether we lose any precision by simply using bitwise XOR to combine row hashes Doing the math on this one was simpler than actually testing it: using a commutative hash function like XOR means that the number of possible inputs goes from being a Permutation of the leaves to being a Combination of the leaves (since a set of leaves in any order are equal). For XOR you have: (2^127)! / (2^127 - 2^16)! == number of possible permutations of 2^16 hashes of length 127 And for MD5: (2^127)! / (2^16)! * (2^127 - 2^16)! == number of possible combinations of 2^16 hashes of length 127 I wouldn't think it would be possible to notice such a small difference. Nonetheless, I think it is a moot point: > Using XOR further simplifies how hash values for internal nodes are computed. I don't think that using XOR is significantly more efficient. Because XOR is associative, it is possible to hash arbitrary sequential leaves together, which is impossible with MD5, but we never do this: all of our comparison happens on the boundaries defined by IPartitioner.midpoint(), so the tree structure containing predefined/precomputed values can contain any value required for comparison. > Proactive repair > ---------------- > > Key: CASSANDRA-193 > URL: https://issues.apache.org/jira/browse/CASSANDRA-193 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: Stu Hood > Fix For: 0.5 > > Attachments: 193-1-tree-preparation.diff, 193-1-tree-preparation.diff, 193-2-tree.diff, 193-2-tree.diff, 193-3-aes-preparation.diff, 193-3-aes-preparation.diff, 193-4-aes.diff, 193-4-aes.diff, 193-5-manual-repair.diff, 193-6-inverted-filter.diff, 193-6-inverted-filter.diff, 193-7-disable-caching-and-fix-minimum-token.diff, 193-breakdown.txt, mktree-and-binary-tree.png > > > Currently cassandra supports "read repair," i.e., lazy repair when a read is done. This is better than nothing but is not sufficient for some cases (e.g. catastrophic node failure where you need to rebuild all of a node's data on a new machine). > Dynamo uses merkle trees here. This is harder for Cassandra given the CF data model but I suppose we could just hash the serialized CF value. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.