cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carl Mueller <carl.muel...@smartthings.com.INVALID>
Subject Re: AWS ephemeral instances + backup
Date Mon, 09 Dec 2019 20:02:56 GMT
Jeff: the gp2 drives are expensive, especially if you have to make them
unnecessarily large to get the IOPS, and I want to get cheap per node as
possible to get as many nodes as possible.

i3 + a cheap rust backup beats an m5 or similar one + EBS gp2 in cost when
i did the numbers

Ben: Going to s3 would be even cheaper and probably about the same speed, I
think I was avoiding it for the network cost and throttling/not throttling,
but if it is cheap enough vs the rust EBS then I'll do that. I think I came
across your page when doing earlier research.

Jon: I have my own thing that is very similar to medusa but supports our
wonky various modes of access (bastions, ipv6, etc). Very similar with
comparative incremental backups and the like. The backups run at scheduled
times, but my rewrite would enable a more local strategy by watching the
sstabledirs. The restore modes of medusa are better in some respects, but I
can do more complicated things too. I'm trying to abstract access mode
(k8/ssh/etc), cloud, and even tech (kafka/cassandra) in a rewrite and it is
damn hard to avoid leakage of abstractions

Reid: possibly we could but the ebs snapshot needs to do the 100G's every
time, while various sstable copies/incremental backups just do the new
files so the raw amount of bits being saved is just faster and more
resiliant

Thank you everyone, at least with all you bigwigs giving advice I can argue
from appeal to authority to management :-) (which is always more effective
than arguing from reason or evidence)


On Fri, Dec 6, 2019 at 9:18 AM Reid Pinchback <rpinchback@tripadvisor.com>
wrote:

> Correction:  “most of your database will be in chunk cache, or buffer
> cache anyways.
>
>
>
> *From: *Reid Pinchback <rpinchback@tripadvisor.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Friday, December 6, 2019 at 10:16 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: AWS ephemeral instances + backup
>
>
>
> *Message from External Sender*
>
> If you’re only going to have a small storage footprint per node like
> 100gb, another option comes to mind. Use an instance type with large ram.
> Use an EBS storage volume on an EBS-optimized instance type, and take EBS
> snapshots. Most of your database will be in chunk cache anyways, so you
> only need to make sure that the dirty background writer is keeping up.  I’d
> take a look at iowait during a snapshot and see if the results are
> acceptable for a running node.  Even if it is marginal, if you’re only
> snapshotting one node at a time, then speculative retry would just skip
> over the temporary slowpoke.
>
>
>
> *From: *Carl Mueller <carl.mueller@smartthings.com.INVALID>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Thursday, December 5, 2019 at 3:21 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *AWS ephemeral instances + backup
>
>
>
> *Message from External Sender*
>
> Does anyone have experience tooling written to support this strategy:
>
> Use case: run cassandra on i3 instances on ephemerals but synchronize the
> sstables and commitlog files to the cheapest EBS volume type (those have
> bad IOPS but decent enough throughput)
>
> On node replace, the startup script for the node, back-copies the sstables
> and commitlog state from the EBS to the ephemeral.
>
> As can be seen:
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.aws.amazon.com_AWSEC2_latest_UserGuide_EBSVolumeTypes.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=vReT2cww6MdAQWz8b6u96QUK08ufU_4uP3X-zH4CyTc&s=CXEcXQAHUhdV8CrzCfURvvW9qRDp_Ji9TvbUgVwKxhA&e=>
>
> the (presumably) spinning rust tops out at 2375 MB/sec (using multiple EBS
> volumes presumably) that would incur about a ten minute delay for node
> replacement for a 1TB node, but I imagine this would only be used on higher
> IOPS r/w nodes with smaller densities, so 100GB would be about a minute of
> delay only, already within the timeframes of an AWS node
> replacement/instance restart.
>
>
>

Mime
View raw message