cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Schrag (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7720) Add a more consistent snapshot mechanism
Date Fri, 08 Aug 2014 03:22:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090251#comment-14090251
] 

Mike Schrag commented on CASSANDRA-7720:
----------------------------------------

I agree about not having any guarantees on ordering. And for our running system, this isn't
a big deal, because it will be eventually correct. Snapshotting is an interesting problem,
though, because you potentially preserve a view of the world that you can never recover from
in your backups. With what I'm proposing, if you snapshot an entire cluster and then restore
it onto a brand new cluster, you at least get a cluster-wide consistent view of the universe
at time 't'. In the current system, you can get unlucky and manage to literally never get
an A written to disk (we had this happen). With the consistent time-t snapshot, you'd be globally
consistent in your backup up to any given point, so you might get an A without a B, but you'd
never get a B without an A. The backup-and-restore case is really nasty because it's conceptually
like an infinite-duration network partition, so if you don't try your best to get a good view
of the world, there's no eventual consistency that is ever going to fix you up.

> Add a more consistent snapshot mechanism
> ----------------------------------------
>
>                 Key: CASSANDRA-7720
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7720
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Mike Schrag
>
> We’ve hit an interesting issue with snapshotting, which makes sense in hindsight, but
presents an interesting challenge for consistent restores:
> * initiate snapshot
> * snapshotting flushes table A and takes the snapshot
> * insert into table A
> * insert into table B
> * snapshotting flushes table B and takes the snapshot
> * snapshot finishes
> So what happens here is that we end up having a B, but NOT having an A, even though B
was chronologically inserted after A.
> It makes sense when I think about what snapshot is doing, but I wonder if snapshots actually
should get a little fancier to behave a little more like what I think most people would expect.
What I think should happen is something along the lines of the following:
> For each node:
> * pass a client timestamp in the snapshot call corresponding to "now"
> * snapshot the tables using the existing procedure
> * walk backwards through the linked snapshot sstables in that snapshot
>   * if the earliest update in that sstable is after the client's timestamp, delete the
sstable in the snapshot
>   * if the earliest update in the sstable is before the client's timestamp, then look
at the last update. Walk backwards through that sstable.
>     * if any updates fall after the timestamp, make a copy of that sstable in the snapshot
folder only up to the point of the timestamp and then delete the original sstable in the snapshot
(we need to copy because we're likely holding a shared hard linked sstable)
> I think this would guarantee that you have a chronologically consistent view of your
snapshot across all machines and columnfamilies within a given snapshot.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message