cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hao Cheng <br...@critica.io>
Subject Re: Cassandra on AWS suggestions for data safety
Date Thu, 24 Jul 2014 11:47:14 GMT
Thanks for your response!

We're planning using the r3.large instances, they seem to offer the best
price/performance for our application (the cheapest way to get both 15GB of
RAM and SSD storage). Unfortunately cost wise we can't justify having
beefier instances with satisfactory cluster sizes at this time.

I guess I was thinking that EBS would be a closer, faster snapshot target
that could potentially allow us to take snapshots more often, and we could
take advantage of incremental EBS snapshots instead of having to do our own
incremental backup system to S3.

I looked into Priam but I was kind of driven away by the inability to
handle vnodes, and a Cassandra admin I talked to out-of-band recommended
staying away unless I was contemplating very large cluster sizes. I've also
seen duplicity mentioned as a backup solution, although I know even less
about it. Did you use Priam and if so, how were your experiences with it?


On Thu, Jul 24, 2014 at 3:07 AM, Alex Major <al3xdm@gmail.com> wrote:

> On Thu, Jul 24, 2014 at 12:12 AM, Hao Cheng <bryan@critica.io> wrote:
>
>> Hello,
>>
>> Based on what I've read in the archives here and on the documentation on
>> Datastax and the Cassandra Community, EBS volumes, even provisioned IOPS
>> with EBS optimized instances, are not recommended due to inconsistent
>> performance. This I can deal with, but I was hoping for some
>> recommendations from the community as far as solutions for data safety.
>>
>> I have a few ideas in mind:
>>
>> 1. Instance store for the database, then cassandra snapshots (via
>> nodetool), stored on an EBS provisioned IOPS volume attached to the
>> instance. That volume would serve to keep the DB safe in case of instance
>> downtime, and I would set up regular snapshotting on the EBS volume for
>> data safety (pushed to S3 and eventually glacier)
>>
>> 2. Instance store used as a bcache write-through cache for attached EBS
>> volumes. The attached volumes persist all writes and are again snapshotted
>> regularly.
>>
>> 3. Using a backup system, either manually via rsync or through something
>> like Priam, to directly push backups of the data on ephemeral storage to S3.
>>
>> From where I'm sitting, #2 seems the easiest to set up, but could
>> potentially cause problems if the EBS volume backing writes sees a spike in
>> latency, driving up write times even if read times would remain fairly
>> consistent.
>>
>> Do any of you all have recommendations or suggestions for a system like
>> this?
>>
>> Thanks in advance!
>>
>> --Bryan
>>
>
> We have a cluster running that uses EBS with Provisioned Iops and we get
> good performance off them (comparable to instance store). The reason we're
> moving off them is purely because EBS has been the thing that most often
> crashes on AWS. The AWS SSD instance types are where we're heading and I'd
> recommend them if you can. Also make sure to keep at least 3 replicas,
> things tend to fail more regularly so it'll keep you from having immediate
> problems.
>
> Our setup is to snapshot the instance stores and sync to S3. Not sure why
> you'd sync to EBS really. Priam which you mentioned makes keeping backups
> (snapshots) and storing them on S3 really simple -
> https://github.com/Netflix/Priam/wiki/Backups
>

Mime
View raw message