cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Ramirez <>
Subject Re: How to remove huge files with all expired data sooner?
Date Mon, 28 Sep 2015 06:59:24 GMT

You should never run `nodetool compact` since this will result in a massive
SSTable that will almost never get compacted out or take a very long time
to get compacted out.

You are correct that there needs to be 4 similar-sized SSTables for them to
get compacted. If you want the expired data to be deleted quicker, try
lowering the STCS `min_threshold` to 3 or even 2. Good luck!


On Sat, Sep 26, 2015 at 4:40 AM, Dongfeng Lu <> wrote:

> Hi I have a table where I set TTL to only 7 days for all records and we
> keep pumping records in every day. In general, I would expect all data
> files for that table to have timestamps less than, say 8 or 9 days old,
> giving the system some time to work its magic. However, I see some files
> more than 9 days old occationally. Last Friday, I saw 4 large files, each
> about 10G in size, with timestamps about 5, 4, 3, 2 weeks old.
> Interestingly they are all gone this Monday, leaving 1 new file 9 GB in
> size.
> The compaction strategy is SizeTieredCompactionStrategy, and I can
> understand why the above happened. It seems we have 10G of data every week
> and when SizeTieredCompactionStrategy works to create various tiers, it
> just happened the file size for the next tier is 10G, and all the data is
> packed into this huge file. Then it starts the next cycle. Another week
> goes by, and another 10G file is created. This process continues until the
> minimum number of files of the same size is reached, which I think is 4 by
> default. Then it started to compact this set of 4 10G files. At this time,
> all data in these 4 files have expired so we end up with nothing or much
> smaller file if there is still some records with TTL left.
> I have many tables like this, and I'd like to reclaim those spaces sooner.
> What would be the best way to do it? Should I run "nodetool compact" when I
> see two large files that are 2 weeks old? Is there configuration parameters
> I can tune to achieve the same effect? I looked through all the CQL
> Compaction Subproperties for STCS, but I am not sure how they can help
> here. Any suggestion is welcome.
> BTW, I am using Cassandra 2.0.6.

View raw message