cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Coli <>
Subject Re: different disk foot print of cassandra data folder on copying
Date Wed, 05 Nov 2014 20:13:45 GMT
On Wed, Nov 5, 2014 at 12:08 PM, KZ Win <> wrote:

> I have cassandra nodes with long uptime.  Disk foot print for
> cassandra data older is different when I copy to a different folder.

> I am talking about as much 100% different for 25-40GB of data.  On
> copying they grow to double that.

1) Cassandra automatically "snapshots" SSTables when one does certain
2) One can also manually create snapshots.
3) Snapshots are hard links to files.
4) Hard links to files generally become duplicate files when copied to
another partition, unless rsync or cp is configured to maintain the hard
link relationship.
5) snapshots are kept in a subdirectory of the data directory for the
6) This all has the pathological seeming outcome that snapshots become
effectively larger as time passes (because the hard links they contain
become the only copy of the file when the "original" is deleted from the
data directory via compaction) and might grow significantly when copied.

tl;dr : modify your rsync to include --exclude=snapshots/


View raw message