Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Fri, 8 Aug 2014 03:22:11 +0000 (UTC)
From: "Mike Schrag (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12732743.1407464275028.40322.1407468131591@arcas>
In-Reply-To: <JIRA.12732743.1407464275028@arcas>
References: <JIRA.12732743.1407464275028@arcas>
Subject: [jira] [Commented] (CASSANDRA-7720) Add a more consistent snapshot
 mechanism
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/CASSANDRA-7720?page=3Dcom.atlas=
sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D=
14090251#comment-14090251 ]=20

Mike Schrag commented on CASSANDRA-7720:
----------------------------------------

I agree about not having any guarantees on ordering. And for our running sy=
stem, this isn't a big deal, because it will be eventually correct. Snapsho=
tting is an interesting problem, though, because you potentially preserve a=
 view of the world that you can never recover from in your backups. With wh=
at I'm proposing, if you snapshot an entire cluster and then restore it ont=
o a brand new cluster, you at least get a cluster-wide consistent view of t=
he universe at time 't'. In the current system, you can get unlucky and man=
age to literally never get an A written to disk (we had this happen). With =
the consistent time-t snapshot, you'd be globally consistent in your backup=
 up to any given point, so you might get an A without a B, but you'd never =
get a B without an A. The backup-and-restore case is really nasty because i=
t's conceptually like an infinite-duration network partition, so if you don=
't try your best to get a good view of the world, there's no eventual consi=
stency that is ever going to fix you up.

> Add a more consistent snapshot mechanism
> ----------------------------------------
>
>                 Key: CASSANDRA-7720
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7720
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Mike Schrag
>
> We=E2=80=99ve hit an interesting issue with snapshotting, which makes sen=
se in hindsight, but presents an interesting challenge for consistent resto=
res:
> * initiate snapshot
> * snapshotting flushes table A and takes the snapshot
> * insert into table A
> * insert into table B
> * snapshotting flushes table B and takes the snapshot
> * snapshot finishes
> So what happens here is that we end up having a B, but NOT having an A, e=
ven though B was chronologically inserted after A.
> It makes sense when I think about what snapshot is doing, but I wonder if=
 snapshots actually should get a little fancier to behave a little more lik=
e what I think most people would expect. What I think should happen is some=
thing along the lines of the following:
> For each node:
> * pass a client timestamp in the snapshot call corresponding to "now"
> * snapshot the tables using the existing procedure
> * walk backwards through the linked snapshot sstables in that snapshot
>   * if the earliest update in that sstable is after the client's timestam=
p, delete the sstable in the snapshot
>   * if the earliest update in the sstable is before the client's timestam=
p, then look at the last update. Walk backwards through that sstable.
>     * if any updates fall after the timestamp, make a copy of that sstabl=
e in the snapshot folder only up to the point of the timestamp and then del=
ete the original sstable in the snapshot (we need to copy because we're lik=
ely holding a shared hard linked sstable)
> I think this would guarantee that you have a chronologically consistent v=
iew of your snapshot across all machines and columnfamilies within a given =
snapshot.


--
This message was sent by Atlassian JIRA
(v6.2#6252)