Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of blanquer@rightscale.com
 designates 209.85.214.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <BANLkTi=exU4SabUq7T_4EhHw6Zyp61ZvOg@mail.gmail.com>
References: <5306E6F7-3429-41A9-AEB4-0DBE45F07638@gmail.com>
	<BANLkTinE7eJ+CLhTinxhcmaw0qvFHVUf=Q@mail.gmail.com>
	<BANLkTi=zAmx-1qMvnZyt0FzwntRjANY=8g@mail.gmail.com>
	<BANLkTin+_yX9kq6OHpt5KEeoLr4Gp8-4hA@mail.gmail.com>
	<BANLkTimegmkz=LJfY=Bz2JFBaE36DhZA3g@mail.gmail.com>
	<BANLkTi=exU4SabUq7T_4EhHw6Zyp61ZvOg@mail.gmail.com>
Date: Thu, 23 Jun 2011 08:53:59 -0700
Message-ID: <BANLkTinS37xppvnAo=TQSbg0NforYHCXtA@mail.gmail.com>
Subject: Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with
 Amazon EBS Snapshots?
From: Josep Blanquer <blanquer@rightscale.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0015174ff31ee2d1e204a66317a7

--0015174ff31ee2d1e204a66317a7
Content-Type: text/plain; charset=ISO-8859-1

On Thu, Jun 23, 2011 at 8:02 AM, William Oberman
<oberman@civicscience.com>wrote:

> I've been doing EBS snapshots for mysql for some time now, and was using a
> similar pattern as Josep (XFS with freeze, snap, unfreeze), with the extra
> complication that I was actually using 8 EBS's in RAID-0 (and the extra
> extra complication that I had to lock the MyISAM tables... glad to be moving
> away from that).  For cassandra I switched to ephemeral disks, as per
> recommendations from this forum.
>
> yes, if you want to consistently snap MySQL you need to get it into a
consistent state, so you need to do the whole FLUSH TABLES WITH READ LOCK
yadda yadda, on top of the rest. Otherwise you might snapshot something that
is not correct/consistent...and it's a bit more tricky with snapshotting
slaves, since you need to know where they are in the replication
stream...etc


> One note on EBS snapshots though: the last time I checked (which was some
> time ago) I noticed degraded IO performance on the box during the
> snapshotting process even though the take snapshot command returns almost
> immediately.  My theory back then was that amazon does the
> delta/compress/store "outside" of the VM, but it obviously has an effect on
> resources on the box the VM runs on.  I was doing this on a mysql slave that
> no one talked to, so I didn't care/bother looking into it further.
>
>
Yes, that is correct. The underlying copy-on-write-and-ship-to-EBS/S3 does
has some performance impact  on the running box. For the most part it's
never presented a problem for us or many of our customers, although you're
right, it's something you want to know about and have in mind when designing
your system (for example for snapshot slaves much more often than masters,
and do masters when the traffic is low, stagger cassandra snaps...yadda
yadda).
If you think about it, this effect is not that different from using LVM
snaps on the ephemeral, and then moving the data from the snap to another
disk or a remote storage...moving those blocks it would have an impact on
the original LVM volume since it's reading the same physical (ephemeral)
disk/s underneath (list of clean and dirty blocks).

One case I could see the slightly reduced IO performance being problematic
if your DB/storage is already at the edge of I/O capacity...but in that
case, the small overhead of a snapshots is probably the least of your
problems :) EBS slowness or malfunction can also impact the instance,
obviously, although that is not only related to snapshots, since it can
impact the actual volume regardless.

 Josep M.

--0015174ff31ee2d1e204a66317a7
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br><br><div class=3D"gmail_quote">On Thu, Jun 23, 2011 at 8:02 AM, William=
 Oberman <span dir=3D"ltr">&lt;<a href=3D"mailto:oberman@civicscience.com">=
oberman@civicscience.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmai=
l_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left=
:1ex;">
I&#39;ve been doing EBS snapshots for mysql for some time now, and was usin=
g a similar pattern as Josep (XFS with freeze, snap, unfreeze), with the ex=
tra complication that I was actually using 8 EBS&#39;s in RAID-0 (and the e=
xtra extra complication that I had to lock the MyISAM tables... glad to be =
moving away from that).=A0 For cassandra I switched to ephemeral disks, as =
per recommendations from this forum.<br>


<br></blockquote><div>yes, if you want to consistently snap MySQL you need =
to get it into a consistent state, so you need to do the whole FLUSH TABLES=
 WITH READ LOCK yadda yadda, on top of the rest. Otherwise you might snapsh=
ot something that is not correct/consistent...and it&#39;s a bit more trick=
y with snapshotting slaves, since you need to know where they are in the re=
plication stream...etc<br>
<br>=A0<br></div><blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt=
 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">=
One note on EBS snapshots though: the last time I checked (which was some t=
ime ago) I noticed degraded IO performance on the box during the snapshotti=
ng process even though the take snapshot command returns almost immediately=
.=A0 My theory back then was that amazon does the delta/compress/store &quo=
t;outside&quot; of the VM, but it obviously has an effect on resources on t=
he box the VM runs on.=A0 I was doing this on a mysql slave that no one tal=
ked to, so I didn&#39;t care/bother looking into it further.<br>


<br></blockquote><div><br>Yes, that is correct. The underlying copy-on-writ=
e-and-ship-to-EBS/S3 does has some performance impact=A0 on the running box=
. For the most part it&#39;s never presented a problem for us or many of ou=
r customers, although you&#39;re right, it&#39;s something you want to know=
 about and have in mind when designing your system (for example for snapsho=
t slaves much more often than masters, and do masters when the traffic is l=
ow, stagger cassandra snaps...yadda yadda).<br>
If you think about it, this effect is not that different from using LVM sna=
ps on the ephemeral, and then moving the data from the snap to another disk=
 or a remote storage...moving those blocks it would have an impact on the o=
riginal LVM volume since it&#39;s reading the same physical (ephemeral) dis=
k/s underneath (list of clean and dirty blocks).<br>
<br>One case I could see the slightly reduced IO performance being problema=
tic if your DB/storage is already at the edge of I/O capacity...but in that=
 case, the small overhead of a snapshots is probably the least of your prob=
lems :) EBS slowness or malfunction can also impact the instance, obviously=
, although that is not only related to snapshots, since it can impact the a=
ctual volume regardless. <br>
<br>=A0Josep M.<br></div></div>

--0015174ff31ee2d1e204a66317a7--