Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 89AE2200CAF for ; Thu, 22 Jun 2017 15:49:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8861A160BE7; Thu, 22 Jun 2017 13:49:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CBB31160BE5 for ; Thu, 22 Jun 2017 15:49:07 +0200 (CEST) Received: (qmail 82199 invoked by uid 500); 22 Jun 2017 13:49:05 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 82050 invoked by uid 99); 22 Jun 2017 13:49:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jun 2017 13:49:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2F94CC32A4 for ; Thu, 22 Jun 2017 13:49:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.011 X-Spam-Level: X-Spam-Status: No, score=-100.011 tagged_above=-999 required=6.31 tests=[SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id kFMKUQuLcoPa for ; Thu, 22 Jun 2017 13:49:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 490A35F366 for ; Thu, 22 Jun 2017 13:49:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 3F78FE0DE3 for ; Thu, 22 Jun 2017 13:49:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4655E21940 for ; Thu, 22 Jun 2017 13:49:01 +0000 (UTC) Date: Thu, 22 Jun 2017 13:49:01 +0000 (UTC) From: "Cameron Zemek (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-8911) Consider Mutation-based Repairs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 22 Jun 2017 13:49:08 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cameron Zemek updated CASSANDRA-8911: ------------------------------------- Labels: repair (was: ) > Consider Mutation-based Repairs > ------------------------------- > > Key: CASSANDRA-8911 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8911 > Project: Cassandra > Issue Type: Improvement > Reporter: Tyler Hobbs > Labels: repair > Fix For: 4.x > > > We should consider a mutation-based repair to replace the existing streaming repair. While we're at it, we could do away with a lot of the complexity around merkle trees. > I have not planned this out in detail, but here's roughly what I'm thinking: > * Instead of building an entire merkle tree up front, just send the "leaves" one-by-one. Instead of dealing with token ranges, make the leaves primary key ranges. The PK ranges would need to be contiguous, so that the start of each range would match the end of the previous range. (The first and last leaves would need to be open-ended on one end of the PK range.) This would be similar to doing a read with paging. > * Once one page of data is read, compute a hash of it and send it to the other replicas along with the PK range that it covers and a row count. > * When the replicas receive the hash, the perform a read over the same PK range (using a LIMIT of the row count + 1) and compare hashes (unless the row counts don't match, in which case this can be skipped). > * If there is a mismatch, the replica will send a mutation covering that page's worth of data (ignoring the row count this time) to the source node. > Here are the advantages that I can think of: > * With the current repair behavior of streaming, vnode-enabled clusters may need to stream hundreds of small SSTables. This results in increased compact > ion load on the receiving node. With the mutation-based approach, memtables would naturally merge these. > * It's simple to throttle. For example, you could give a number of rows/sec that should be repaired. > * It's easy to see what PK range has been repaired so far. This could make it simpler to resume a repair that fails midway. > * Inconsistencies start to be repaired almost right away. > * Less special code \(?\) > * Wide partitions are no longer a problem. > There are a few problems I can think of: > * Counters. I don't know if this can be made safe, or if they need to be skipped. > * To support incremental repair, we need to be able to read from only repaired sstables. Probably not too difficult to do. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org For additional commands, e-mail: commits-help@cassandra.apache.org