Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of al3xdm@gmail.com designates
 74.125.82.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <47D9EE5E-2AA3-4B79-88B6-A5D8C8390B0A@gmail.com>
References: <D894C574-DCF7-4FEE-932A-82E55CB5AA12@yahoo.com>
	<CAMt1n-DvytxAd7M+dFEX8JFGLdH4HcgGv1Jtcd+3Db4+hT4LeA@mail.gmail.com>
	<47D9EE5E-2AA3-4B79-88B6-A5D8C8390B0A@gmail.com>
Date: Wed, 4 Jul 2012 13:52:17 +0100
Message-ID: 
 <CAMt1n-AOej5q7pGc62QixNxpbpf8ceESWoQnJMs-JObcGcZUsg@mail.gmail.com>
Subject: Re: Expanding Cassandra on EC2 with consistency
From: Alex Major <al3xdm@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=f46d043892bd39a3bb04c4008027

--f46d043892bd39a3bb04c4008027
Content-Type: text/plain; charset=ISO-8859-1

Hi Dan,

We run RF 2 on RAID0 EBS drives. The reason we use EBS over on-instance
storage is two fold;

Firstly we have a relatively small cluster ( 4 nodes ), so we're quite
sensitive to any AWS issues (at the region level). If we had a larger
cluster then we would definitely use ephemeral storage as ephemeral storage
will provide much more consistent (*slightly higher*) throughput than EBS.
I'm sure you've read a lot about how bad EBS performance is, but genuinely
we see very little difference between EBS / ephemeral storage in terms of
performance when in a RAID0 setup. Some numbers which are similar to our
tests can be found here:
http://stu.mp/2009/12/disk-io-and-throughput-benchmarks-on-amazons-ec2.html.
The only way you'll know for yourself whether EBS is acceptable is
running your own tests, but definitely take anything you read on blog posts
etc about EBS performance with a pinch of salt. The only real disadvantage
of EBS in our experience is that it's an additional moving part that will
fail. You can find yourself in a position where EC2 is running normally but
EBS is down, thus your nodes are down.

The second reason, and the most important for us was that we didn't have
time to build a good automated backup service for the ephemeral storage.
The advantage of EBS is that the instance can fail and we can start a new
one using the same drive, where-as data on ephemeral will get lost if the
node is lost/shutdown. Using ephemeral storage would put us in a difficult
position of loosing some data should a couple of instances fail for
whatever reason (we don't loose instances often, however when we do we tend
to loose several). If you run a larger cluster (imo 10+) then definitely
use ephemeral as you shouldn't be very sensitive to loosing a node or two.

The second point however doesn't hold as strong today as it did when we
made our decision last year. Netflix recently open-sourced a really good
tool ( https://github.com/Netflix/Priam/wiki/Backups ) which will automate
the back-up of Cassandra data to S3. I'd definitely recommend checking it
out to see if it will help you with AWS backups / restores, we're currently
looking at rolling it out.

Hope that helps,

Alex.

On Wed, Jul 4, 2012 at 2:56 AM, Dan Foody <dan.foody@gmail.com> wrote:

> Hi Alex,
>
> Can you share what replication factor you're running?
> And, are you using ephemeral disks or EBS volumes?
>
> Thanks!
>
> - Dan
>
>
>
> On Jul 3, 2012, at 5:52 PM, Alex Major wrote:
>
> Hi Mike,
>
> We've run a small (4 node) cluster in the EU region since September last
> year. We run across all 3 availability zones in the EU region, with 2 nodes
> in one AZ and then a further node in each AZ. The latency difference
> between running inside of and between AZ's has been minimal in our
> experience.
>
> It's only when we've gone cross-region that there's been latency problem.
> We temporarily ran a 9 node cluster across 3 regions, however even then
> using local quoram the latency was better than the standard datacenter -
> datacenter latency we're used to.
>
> EC2Snitch is definitely the way to go in favour of NTS in my opinion. NTS
> was a pain to get setup with the internal (private) IP address setup, so
> much so that we never got it safely replicating the data as we wanted.
>
> Alex.
>
> On Tue, Jul 3, 2012 at 2:16 PM, Michael Theroux <mtheroux2@yahoo.com>wrote:
>
>> Hello,
>>
>> We are currently running a web application utilizing Cassandra on EC2.
>>  Given the recent outages experienced with Amazon, we want to consider
>> expanding Cassandra across availability zones sooner rather than later.
>>
>> We are trying to determine the optimal way to deploy Cassandra in this
>> deployment.  We are researching the NetworkTopologyStrategy, and the
>> EC2Snitch.  We are also interested in providing a high level of read or
>> write consistency,
>>
>> My understanding is that the EC2Snitch recognizes availability zones as
>> racks, and regions as data-centers.  This seems to be a common
>> configuration.  However, if we were to want to utilize queries with a READ
>> or WRITE consistency of QUORUM, would there be a high possibility that the
>> communication necessary to establish a quorum, across availability zones?
>>
>> My understanding is that the NetworkTopologyStrategy attempts to prefer
>> replicas be stored on other racks within the datacenter, which would equate
>> to other availability zones in EC2.  This implies to me that in order to
>> have the quorum of nodes necessary to achieve consistency, that Cassandra
>> will communicate with nodes across availability zones.
>>
>> First, is my understanding correct?  Second, given the high latency that
>> can sometimes exists between availability zones, is this a problem, and
>> instead we should treat availability zones as data centers?
>>
>> Ideally, we would be able to setup a situation where we could store
>> replicas across availability zones in case of failure, but establish a high
>> level of read or write consistency within a single availability zone.
>>
>> I appreciate your responses,
>> Thanks,
>> -Mike
>>
>>
>>
>>
>
>

--f46d043892bd39a3bb04c4008027
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Dan,<div><br></div><div>We run RF 2 on RAID0 EBS drives.=A0The reason we=
 use EBS over on-instance storage is two fold;</div><div><br></div><div>Fir=
stly we have a relatively small cluster ( 4 nodes ), so we&#39;re quite sen=
sitive to any AWS issues (at the region level). If we had a larger cluster =
then we would definitely use ephemeral storage as ephemeral storage will pr=
ovide much more consistent (*slightly higher*) throughput than EBS. I&#39;m=
 sure you&#39;ve read a lot about how bad EBS performance is, but genuinely=
 we see very little difference between EBS / ephemeral storage in terms of =
performance when in a RAID0 setup. Some numbers which are similar to our te=
sts can be found here: <a href=3D"http://stu.mp/2009/12/disk-io-and-through=
put-benchmarks-on-amazons-ec2.html">http://stu.mp/2009/12/disk-io-and-throu=
ghput-benchmarks-on-amazons-ec2.html</a> . The only way you&#39;ll know for=
 yourself whether EBS is acceptable is running your own tests, but definite=
ly take anything you read on blog posts etc about EBS performance with a pi=
nch of salt. The only real disadvantage of EBS in our experience is that it=
&#39;s an additional moving part that will fail. You can find yourself in a=
 position where EC2 is running normally but EBS is down, thus your nodes ar=
e down.</div>
<div><br></div><div>The second reason, and the most important for us was th=
at we didn&#39;t have time to build a good automated backup service for the=
 ephemeral storage. The advantage of EBS is that the instance can fail and =
we can start a new one using the same drive, where-as data on ephemeral wil=
l get lost if the node is lost/shutdown. Using ephemeral storage would put =
us in a difficult position of loosing some data should a couple of instance=
s fail for whatever reason (we don&#39;t loose instances often, however whe=
n we do we tend to loose several). If you run a larger cluster (imo 10+) th=
en definitely use ephemeral as you shouldn&#39;t be very sensitive to loosi=
ng a node or two.=A0</div>
<div><br></div><div>The second point however doesn&#39;t hold as strong tod=
ay as it did when we made our decision last year. Netflix recently open-sou=
rced a really good tool (=A0<a href=3D"https://github.com/Netflix/Priam/wik=
i/Backups">https://github.com/Netflix/Priam/wiki/Backups</a> ) which will a=
utomate the back-up of Cassandra data to S3. I&#39;d definitely recommend c=
hecking it out to see if it will help you with AWS backups / restores, we&#=
39;re currently looking at rolling it out.=A0</div>
<div><br></div><div>Hope that helps,</div><div><br></div><div>Alex.</div><d=
iv><br><div class=3D"gmail_quote">On Wed, Jul 4, 2012 at 2:56 AM, Dan Foody=
 <span dir=3D"ltr">&lt;<a href=3D"mailto:dan.foody@gmail.com" target=3D"_bl=
ank">dan.foody@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div style=3D"word-wrap:break-word">Hi Alex,=
<div><br></div><div>Can you share what replication factor you&#39;re runnin=
g?</div>
<div>And, are you using ephemeral disks or EBS volumes?</div><div><br></div=
><div>Thanks!<span class=3D"HOEnZb"><font color=3D"#888888"><br><div>
<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;te=
xt-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norm=
al;border-collapse:separate;text-transform:none;font-size:medium;white-spac=
e:normal;font-family:Helvetica;word-spacing:0px"><span style=3D"text-indent=
:0px;letter-spacing:normal;font-variant:normal;text-align:-webkit-auto;font=
-style:normal;font-weight:normal;line-height:normal;border-collapse:separat=
e;text-transform:none;font-size:medium;white-space:normal;font-family:Helve=
tica;word-spacing:0px"><div style=3D"word-wrap:break-word">
<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;te=
xt-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norm=
al;border-collapse:separate;text-transform:none;font-size:medium;white-spac=
e:normal;font-family:Helvetica;word-spacing:0px"><div>
<br>- Dan</div><div><br></div></span></div></span></span><br>
</div></font></span><div><div class=3D"h5">
<br><div><div>On Jul 3, 2012, at 5:52 PM, Alex Major wrote:</div><br><block=
quote type=3D"cite">Hi Mike,<div><br></div><div>We&#39;ve run a small (4 no=
de) cluster in the EU region since September last year. We run across all 3=
=A0availability=A0zones in the EU region, with 2 nodes in one AZ and then a=
 further node in each AZ. The latency difference between running inside of =
and between AZ&#39;s has been minimal in our experience.=A0</div>

<div><br></div><div>It&#39;s only when we&#39;ve gone cross-region that the=
re&#39;s been latency problem. We temporarily ran a 9 node cluster across 3=
 regions, however even then using local quoram the latency was better than =
the standard datacenter - datacenter latency we&#39;re used to.</div>

<div><br></div><div>EC2Snitch is=A0definitely=A0the way to go in favour of =
NTS in my=A0opinion. NTS was a pain to get setup with the internal (private=
) IP address setup, so much so that we never got it safely replicating the =
data as we wanted.</div>

<div><br></div><div>Alex.</div><div><br><div class=3D"gmail_quote">On Tue, =
Jul 3, 2012 at 2:16 PM, Michael Theroux <span dir=3D"ltr">&lt;<a href=3D"ma=
ilto:mtheroux2@yahoo.com" target=3D"_blank">mtheroux2@yahoo.com</a>&gt;</sp=
an> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hello,<br>
<br>
We are currently running a web application utilizing Cassandra on EC2. =A0G=
iven the recent outages experienced with Amazon, we want to consider expand=
ing Cassandra across availability zones sooner rather than later.<br>
<br>
We are trying to determine the optimal way to deploy Cassandra in this depl=
oyment. =A0We are researching the NetworkTopologyStrategy, and the EC2Snitc=
h. =A0We are also interested in providing a high level of read or write con=
sistency,<br>


<br>
My understanding is that the EC2Snitch recognizes availability zones as rac=
ks, and regions as data-centers. =A0This seems to be a common configuration=
. =A0However, if we were to want to utilize queries with a READ or WRITE co=
nsistency of QUORUM, would there be a high possibility that the communicati=
on necessary to establish a quorum, across availability zones?<br>


<br>
My understanding is that the NetworkTopologyStrategy attempts to prefer rep=
licas be stored on other racks within the datacenter, which would equate to=
 other availability zones in EC2. =A0This implies to me that in order to ha=
ve the quorum of nodes necessary to achieve consistency, that Cassandra wil=
l communicate with nodes across availability zones.<br>


<br>
First, is my understanding correct? =A0Second, given the high latency that =
can sometimes exists between availability zones, is this a problem, and ins=
tead we should treat availability zones as data centers?<br>
<br>
Ideally, we would be able to setup a situation where we could store replica=
s across availability zones in case of failure, but establish a high level =
of read or write consistency within a single availability zone.<br>
<br>
I appreciate your responses,<br>
Thanks,<br>
-Mike<br>
<br>
<br>
<br>
</blockquote></div><br></div>
</blockquote></div><br></div></div></div></div></blockquote></div><br></div=
>

--f46d043892bd39a3bb04c4008027--