Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 602534CED for ; Thu, 23 Jun 2011 15:54:29 +0000 (UTC) Received: (qmail 99707 invoked by uid 500); 23 Jun 2011 15:54:27 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 99656 invoked by uid 500); 23 Jun 2011 15:54:27 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 99605 invoked by uid 99); 23 Jun 2011 15:54:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jun 2011 15:54:27 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of blanquer@rightscale.com designates 209.85.214.44 as permitted sender) Received: from [209.85.214.44] (HELO mail-bw0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jun 2011 15:54:20 +0000 Received: by bwz13 with SMTP id 13so1931561bwz.31 for ; Thu, 23 Jun 2011 08:53:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.3.196 with SMTP id 4mr1038200bko.188.1308844439556; Thu, 23 Jun 2011 08:53:59 -0700 (PDT) Received: by 10.204.120.81 with HTTP; Thu, 23 Jun 2011 08:53:59 -0700 (PDT) In-Reply-To: References: <5306E6F7-3429-41A9-AEB4-0DBE45F07638@gmail.com> Date: Thu, 23 Jun 2011 08:53:59 -0700 Message-ID: Subject: Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots? From: Josep Blanquer To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0015174ff31ee2d1e204a66317a7 X-Virus-Checked: Checked by ClamAV on apache.org --0015174ff31ee2d1e204a66317a7 Content-Type: text/plain; charset=ISO-8859-1 On Thu, Jun 23, 2011 at 8:02 AM, William Oberman wrote: > I've been doing EBS snapshots for mysql for some time now, and was using a > similar pattern as Josep (XFS with freeze, snap, unfreeze), with the extra > complication that I was actually using 8 EBS's in RAID-0 (and the extra > extra complication that I had to lock the MyISAM tables... glad to be moving > away from that). For cassandra I switched to ephemeral disks, as per > recommendations from this forum. > > yes, if you want to consistently snap MySQL you need to get it into a consistent state, so you need to do the whole FLUSH TABLES WITH READ LOCK yadda yadda, on top of the rest. Otherwise you might snapshot something that is not correct/consistent...and it's a bit more tricky with snapshotting slaves, since you need to know where they are in the replication stream...etc > One note on EBS snapshots though: the last time I checked (which was some > time ago) I noticed degraded IO performance on the box during the > snapshotting process even though the take snapshot command returns almost > immediately. My theory back then was that amazon does the > delta/compress/store "outside" of the VM, but it obviously has an effect on > resources on the box the VM runs on. I was doing this on a mysql slave that > no one talked to, so I didn't care/bother looking into it further. > > Yes, that is correct. The underlying copy-on-write-and-ship-to-EBS/S3 does has some performance impact on the running box. For the most part it's never presented a problem for us or many of our customers, although you're right, it's something you want to know about and have in mind when designing your system (for example for snapshot slaves much more often than masters, and do masters when the traffic is low, stagger cassandra snaps...yadda yadda). If you think about it, this effect is not that different from using LVM snaps on the ephemeral, and then moving the data from the snap to another disk or a remote storage...moving those blocks it would have an impact on the original LVM volume since it's reading the same physical (ephemeral) disk/s underneath (list of clean and dirty blocks). One case I could see the slightly reduced IO performance being problematic if your DB/storage is already at the edge of I/O capacity...but in that case, the small overhead of a snapshots is probably the least of your problems :) EBS slowness or malfunction can also impact the instance, obviously, although that is not only related to snapshots, since it can impact the actual volume regardless. Josep M. --0015174ff31ee2d1e204a66317a7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Thu, Jun 23, 2011 at 8:02 AM, William= Oberman <= oberman@civicscience.com> wrote:
I've been doing EBS snapshots for mysql for some time now, and was usin= g a similar pattern as Josep (XFS with freeze, snap, unfreeze), with the ex= tra complication that I was actually using 8 EBS's in RAID-0 (and the e= xtra extra complication that I had to lock the MyISAM tables... glad to be = moving away from that).=A0 For cassandra I switched to ephemeral disks, as = per recommendations from this forum.

yes, if you want to consistently snap MySQL you need = to get it into a consistent state, so you need to do the whole FLUSH TABLES= WITH READ LOCK yadda yadda, on top of the rest. Otherwise you might snapsh= ot something that is not correct/consistent...and it's a bit more trick= y with snapshotting slaves, since you need to know where they are in the re= plication stream...etc

=A0
= One note on EBS snapshots though: the last time I checked (which was some t= ime ago) I noticed degraded IO performance on the box during the snapshotti= ng process even though the take snapshot command returns almost immediately= .=A0 My theory back then was that amazon does the delta/compress/store &quo= t;outside" of the VM, but it obviously has an effect on resources on t= he box the VM runs on.=A0 I was doing this on a mysql slave that no one tal= ked to, so I didn't care/bother looking into it further.


Yes, that is correct. The underlying copy-on-writ= e-and-ship-to-EBS/S3 does has some performance impact=A0 on the running box= . For the most part it's never presented a problem for us or many of ou= r customers, although you're right, it's something you want to know= about and have in mind when designing your system (for example for snapsho= t slaves much more often than masters, and do masters when the traffic is l= ow, stagger cassandra snaps...yadda yadda).
If you think about it, this effect is not that different from using LVM sna= ps on the ephemeral, and then moving the data from the snap to another disk= or a remote storage...moving those blocks it would have an impact on the o= riginal LVM volume since it's reading the same physical (ephemeral) dis= k/s underneath (list of clean and dirty blocks).

One case I could see the slightly reduced IO performance being problema= tic if your DB/storage is already at the edge of I/O capacity...but in that= case, the small overhead of a snapshots is probably the least of your prob= lems :) EBS slowness or malfunction can also impact the instance, obviously= , although that is not only related to snapshots, since it can impact the a= ctual volume regardless.

=A0Josep M.
--0015174ff31ee2d1e204a66317a7--