cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AJ>
Subject Re: Backups, Snapshots, SSTable Data Files, Compaction
Date Tue, 07 Jun 2011 07:20:21 GMT
On 6/6/2011 11:25 PM, Benjamin Coverston wrote:
>> Currently, my data dir has about 16 sets.  I thought that compaction 
>> (with nodetool) would clean-up these files, but it doesn't.  Neither 
>> does cleanup or repair.
> You're not even talking about snapshots using nodetool snapshot yet. 
> Also nodetool compact does compact all of the live files, however the 
> compacted SSTables will not be cleaned up until a garbage collection 
> is triggered, or a capacity threshold is met.

Ok, so after a compaction, Cass is still not done with the older sets of 
.db files and I should let Cass delete them?  But, I thought one of the 
main purposes of compaction was to reclaim disk storage resources.  I'm 
only playing around with a small data set so I can't tell how fast the 
data grows.  I'm trying to plan my storage requirements.  Is each 
newly-generated set as large in size as the previous?

The reason I ask is it seems a snapshot is...

>> Q1: Should the files with the lower index #'s (under the 
>> data/{keyspace} directory) be manually deleted?  Or, do ALL of the 
>> files in this directory need to be backed-up?
> Do not ever delete files in your data directory if you care about data 
> on that replica, unless they are from a column family that no longer 
> exists on that server. There may be some duplicate data in the files, 
> but if the files are in the data directory, as a general rule, they 
> are there because they contain some set of data that is in none of the 
> other SSTables.

... It seems a snapshot is implemented, unsurprisingly,  as just a link 
to the latest (highest indexed) set; not the previous sets.  So, 
obviously, only the latest *.db files will get backed-up.  Therefore, 
the previous sets must be worthless.

View raw message