cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Major <>
Subject Re: Cassandra on AWS suggestions for data safety
Date Thu, 24 Jul 2014 12:55:21 GMT
On Thu, Jul 24, 2014 at 12:47 PM, Hao Cheng <> wrote:

> Thanks for your response!
> We're planning using the r3.large instances, they seem to offer the best
> price/performance for our application (the cheapest way to get both 15GB of
> RAM and SSD storage). Unfortunately cost wise we can't justify having
> beefier instances with satisfactory cluster sizes at this time.
> I guess I was thinking that EBS would be a closer, faster snapshot target
> that could potentially allow us to take snapshots more often, and we could
> take advantage of incremental EBS snapshots instead of having to do our own
> incremental backup system to S3.
> I looked into Priam but I was kind of driven away by the inability to
> handle vnodes, and a Cassandra admin I talked to out-of-band recommended
> staying away unless I was contemplating very large cluster sizes. I've also
> seen duplicity mentioned as a backup solution, although I know even less
> about it. Did you use Priam and if so, how were your experiences with it?
> On Thu, Jul 24, 2014 at 3:07 AM, Alex Major <> wrote:
>> On Thu, Jul 24, 2014 at 12:12 AM, Hao Cheng <> wrote:
>>> Hello,
>>> Based on what I've read in the archives here and on the documentation on
>>> Datastax and the Cassandra Community, EBS volumes, even provisioned IOPS
>>> with EBS optimized instances, are not recommended due to inconsistent
>>> performance. This I can deal with, but I was hoping for some
>>> recommendations from the community as far as solutions for data safety.
>>> I have a few ideas in mind:
>>> 1. Instance store for the database, then cassandra snapshots (via
>>> nodetool), stored on an EBS provisioned IOPS volume attached to the
>>> instance. That volume would serve to keep the DB safe in case of instance
>>> downtime, and I would set up regular snapshotting on the EBS volume for
>>> data safety (pushed to S3 and eventually glacier)
>>> 2. Instance store used as a bcache write-through cache for attached EBS
>>> volumes. The attached volumes persist all writes and are again snapshotted
>>> regularly.
>>> 3. Using a backup system, either manually via rsync or through something
>>> like Priam, to directly push backups of the data on ephemeral storage to S3.
>>> From where I'm sitting, #2 seems the easiest to set up, but could
>>> potentially cause problems if the EBS volume backing writes sees a spike in
>>> latency, driving up write times even if read times would remain fairly
>>> consistent.
>>> Do any of you all have recommendations or suggestions for a system like
>>> this?
>>> Thanks in advance!
>>> --Bryan
>> We have a cluster running that uses EBS with Provisioned Iops and we get
>> good performance off them (comparable to instance store). The reason we're
>> moving off them is purely because EBS has been the thing that most often
>> crashes on AWS. The AWS SSD instance types are where we're heading and I'd
>> recommend them if you can. Also make sure to keep at least 3 replicas,
>> things tend to fail more regularly so it'll keep you from having immediate
>> problems.
>> Our setup is to snapshot the instance stores and sync to S3. Not sure why
>> you'd sync to EBS really. Priam which you mentioned makes keeping backups
>> (snapshots) and storing them on S3 really simple -
EBS seems to go over the same network as any traffic you'll send to S3,
I've never seen much of a performance difference streaming to either. If
you're using the EBS drives as a stopgap to sending to S3 later, I'd just
send them straight to S3.

Cassandra has incremental snapshots as well as full dumps, if you read the
Priam link it describes the process Priam uses and it's fairly easy to
replicate. We created a script that runs per node and watches for new
SSTables and sends them to S3 with integrity checks. We've done full
restores from the S3 copies a few times. It's not as fully fledged as the
Priam method (which creates a meta.json etc) but fits what we needed.

We don't use Priam as we built some of our own tooling before Priam was
known. Think there's a fork or a branch somewhere with basic vnode stuff
working? Could try running that branch but only use the backup/restore
features, leave the others aside. Test backing up, breaking/tearing down
your cluster and then restoring it.

Guess my only comment would be that you're potentially adding more places
for the backups to fail between taking them and having them somewhere safe.
I'd be getting them off the instance as soon as possible.

View raw message