cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reid Pinchback <rpinchb...@tripadvisor.com>
Subject Re: AWS ephemeral instances + backup
Date Fri, 06 Dec 2019 15:16:19 GMT
If you’re only going to have a small storage footprint per node like 100gb, another option
comes to mind. Use an instance type with large ram.  Use an EBS storage volume on an EBS-optimized
instance type, and take EBS snapshots. Most of your database will be in chunk cache anyways,
so you only need to make sure that the dirty background writer is keeping up.  I’d take
a look at iowait during a snapshot and see if the results are acceptable for a running node.
 Even if it is marginal, if you’re only snapshotting one node at a time, then speculative
retry would just skip over the temporary slowpoke.

From: Carl Mueller <carl.mueller@smartthings.com.INVALID>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Thursday, December 5, 2019 at 3:21 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: AWS ephemeral instances + backup

Message from External Sender
Does anyone have experience tooling written to support this strategy:

Use case: run cassandra on i3 instances on ephemerals but synchronize the sstables and commitlog
files to the cheapest EBS volume type (those have bad IOPS but decent enough throughput)

On node replace, the startup script for the node, back-copies the sstables and commitlog state
from the EBS to the ephemeral.

As can be seen: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.aws.amazon.com_AWSEC2_latest_UserGuide_EBSVolumeTypes.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=vReT2cww6MdAQWz8b6u96QUK08ufU_4uP3X-zH4CyTc&s=CXEcXQAHUhdV8CrzCfURvvW9qRDp_Ji9TvbUgVwKxhA&e=>

the (presumably) spinning rust tops out at 2375 MB/sec (using multiple EBS volumes presumably)
that would incur about a ten minute delay for node replacement for a 1TB node, but I imagine
this would only be used on higher IOPS r/w nodes with smaller densities, so 100GB would be
about a minute of delay only, already within the timeframes of an AWS node replacement/instance
restart.

Mime
View raw message