Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 099A711336 for ; Fri, 8 Aug 2014 03:22:13 +0000 (UTC) Received: (qmail 77470 invoked by uid 500); 8 Aug 2014 03:22:12 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 77409 invoked by uid 500); 8 Aug 2014 03:22:11 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 77398 invoked by uid 99); 8 Aug 2014 03:22:11 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Aug 2014 03:22:11 +0000 Date: Fri, 8 Aug 2014 03:22:11 +0000 (UTC) From: "Mike Schrag (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-7720) Add a more consistent snapshot mechanism MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-7720?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 14090251#comment-14090251 ]=20 Mike Schrag commented on CASSANDRA-7720: ---------------------------------------- I agree about not having any guarantees on ordering. And for our running sy= stem, this isn't a big deal, because it will be eventually correct. Snapsho= tting is an interesting problem, though, because you potentially preserve a= view of the world that you can never recover from in your backups. With wh= at I'm proposing, if you snapshot an entire cluster and then restore it ont= o a brand new cluster, you at least get a cluster-wide consistent view of t= he universe at time 't'. In the current system, you can get unlucky and man= age to literally never get an A written to disk (we had this happen). With = the consistent time-t snapshot, you'd be globally consistent in your backup= up to any given point, so you might get an A without a B, but you'd never = get a B without an A. The backup-and-restore case is really nasty because i= t's conceptually like an infinite-duration network partition, so if you don= 't try your best to get a good view of the world, there's no eventual consi= stency that is ever going to fix you up. > Add a more consistent snapshot mechanism > ---------------------------------------- > > Key: CASSANDRA-7720 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7720 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Mike Schrag > > We=E2=80=99ve hit an interesting issue with snapshotting, which makes sen= se in hindsight, but presents an interesting challenge for consistent resto= res: > * initiate snapshot > * snapshotting flushes table A and takes the snapshot > * insert into table A > * insert into table B > * snapshotting flushes table B and takes the snapshot > * snapshot finishes > So what happens here is that we end up having a B, but NOT having an A, e= ven though B was chronologically inserted after A. > It makes sense when I think about what snapshot is doing, but I wonder if= snapshots actually should get a little fancier to behave a little more lik= e what I think most people would expect. What I think should happen is some= thing along the lines of the following: > For each node: > * pass a client timestamp in the snapshot call corresponding to "now" > * snapshot the tables using the existing procedure > * walk backwards through the linked snapshot sstables in that snapshot > * if the earliest update in that sstable is after the client's timestam= p, delete the sstable in the snapshot > * if the earliest update in the sstable is before the client's timestam= p, then look at the last update. Walk backwards through that sstable. > * if any updates fall after the timestamp, make a copy of that sstabl= e in the snapshot folder only up to the point of the timestamp and then del= ete the original sstable in the snapshot (we need to copy because we're lik= ely holding a shared hard linked sstable) > I think this would guarantee that you have a chronologically consistent v= iew of your snapshot across all machines and columnfamilies within a given = snapshot. -- This message was sent by Atlassian JIRA (v6.2#6252)