cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Haddad <...@jonhaddad.com>
Subject Re: Backups eating up disk space
Date Tue, 10 Jan 2017 17:26:54 GMT
If you remove the files from the backup directory, you would not have data
loss in the case of a node going down.  They're hard links to the same
files that are in your data directory, and are created when an sstable is
written to disk.  At the time, they take up (almost) no space, so they
aren't a big deal, but when the sstable gets compacted, they stick around,
so they end up not freeing space up.

Usually you use incremental backups as a means of moving the sstables off
the node to a backup location.  If you're not doing anything with them,
they're just wasting space and you should disable incremental backups.

Some people take snapshots then rely on incremental backups.  Others use
the tablesnap utility which does sort of the same thing.

On Tue, Jan 10, 2017 at 9:18 AM Kunal Gangakhedkar <kgangakhedkar@gmail.com>
wrote:

> Thanks for quick reply, Jon.
>
> But, what about in case of node/cluster going down? Would there be data
> loss if I remove these files manually?
>
> How is it typically managed in production setups?
> What are the best-practices for the same?
> Do people take snapshots on each node before removing the backups?
>
> This is my first production deployment - so, still trying to learn.
>
> Thanks,
> Kunal
>
> On 10 January 2017 at 21:36, Jonathan Haddad <jon@jonhaddad.com> wrote:
>
> You can just delete them off the filesystem (rm)
>
> On Tue, Jan 10, 2017 at 8:02 AM Kunal Gangakhedkar <
> kgangakhedkar@gmail.com> wrote:
>
> Hi all,
>
> We have a 3-node cassandra cluster with incremental backup set to true.
> Each node has 1TB data volume that stores cassandra data.
>
> The load in the output of 'nodetool status' comes up at around 260GB each
> node.
> All our keyspaces use replication factor = 3.
>
> However, the df output shows the data volumes consuming around 850GB of
> space.
> I checked the keyspace directory structures - most of the space goes in
> <CASS_DATA_VOL>/data/<KEYSPACE>/<CF>/backups.
>
> We have never manually run snapshots.
>
> What is the typical procedure to clear the backups?
> Can it be done without taking the node offline?
>
> Thanks,
> Kunal
>
>
>

Mime
View raw message