Now, we use TTL of 12 hours and GC grace period of 8 hours for encouraging Cassandra to remove old data/files more aggressively.
Cassandra do remove fair amount of old data files.
Cassandra tends to removed 4 out of every 5 files.
I notice it because data file has a sequence number as a part of name.
I also noticed when Cassandra generated *-Compacted file it generated 4 file at a time.
They have consecutive numbers as file name, but skip one number from the previous group of 4.
The one missing is the file that is failed to be removed in the end and stays forever.
I looked at the Keys in an index file that failed to be removed. If I make query of any of keys, Cassandra indicates that there is not data, which is correct because these files are older than 24 hours. All the data must be obsolete due to TTL.
I am wondering why Cassandra does not remove all data file whose time stamp is much older than TTL + grace period.
Does anybody have similar experience ?
From: Watanabe, Hiroyuki: IT (NYK)
Sent: Friday, September 02, 2011 9:01 AM
Subject: RE: Removal of old data files
I see. Thank you for helpful information
From: Sylvain Lebresne [mailto:email@example.com]
Sent: Friday, September 02, 2011 3:40 AM
Subject: Re: Removal of old data files
On Fri, Sep 2, 2011 at 12:11 AM, <firstname.lastname@example.org> wrote:
Yes, I see files with name likeOrders-g-6517-CompactedHowever, all of those file have a size of 0.Starting from Monday to Thurseday we have 5642 files for -Data.db,-Filter.db and Statistics.db and only 128 -Compacted files.and all of -Compacted file has size of 0.Is this normal, or we are doing something wrong?
You are not doing something wrong. The -Compacted files are just marker, to indicate that the -Data file corresponding (with the same number) are, in fact, compacted and will eventually be removed. So those files will always have a size of 0.
yuki________________________________From: aaron morton [mailto:email@example.com]Sent: Thursday, August 25, 2011 6:13 PMSubject: Re: Removal of old data filesIf cassandra does not have enough disk space to create a new file itwill provoke a JVM GC which should result in compacted SStables thatare no longer needed been deleted. Otherwise they are deleted at sometime in the future.Compacted SSTables have a file written out with a "compacted" extension.Do you see compacted sstables in the data directory?Cheers.-----------------Aaron MortonFreelance Cassandra Developer@aaronmortonOn 26/08/2011, at 2:29 AM, yuki watanabe wrote:We are using Cassandra 0.8.0 with 8 node ring and only one CF.Every column has TTL of 86400 (24 hours). we also set 'GC gracesecond' to 43200(12 hours). We have to store massive amount of data for one day nowand eventually for five days if we get more disk space.Even for one day, we do run out disk space in a busy day.We run nodetool compact command at night or as necessary then we runGC from jconsole. We observed that GC did remove files but notnecessarily oldest ones.Data files from more than 36 hours ago and quite often three days agoare still there.Does this behavior expected or we need adjust some other parameters?Yuki Watanabe_______________________________________________This e-mail may contain information that is confidential, privilegedor otherwise protected from disclosure. If you are not an intendedrecipient of this e-mail, do not duplicate or redistribute it by anymeans. Please delete it and any attachments and notify the sender thatyou have received it in error. Unless specifically indicated, thise-mail is not an offer to buy or sell or a solicitation to buy or sellany securities, investment products or other financial product orservice, an official confirmation of any transaction, or an officialstatement of Barclays. Any views or opinions presented are solelythose of the author and do not necessarily represent those ofBarclays. This e-mail is subject to terms available at the followinglink: www.barcap.com/emaildisclaimer. By messaging with Barclays youconsent to the foregoing. Barclays Capital is the investment bankingdivision of Barclays Bank PLC, a company registered in England (number1026167) with its registered office at 1 Churchill Place, London, E14 5HP.This email may relate to or be sent from other members of the BarclaysGroup._______________________________________________
This e-mail may contain
information that is confidential, privileged or otherwise protected from
disclosure. If you are not an intended recipient of this e-mail, do not
duplicate or redistribute it by any means. Please delete it and any attachments
and notify the sender that you have received it in error. Unless specifically
indicated, this e-mail is not an offer to buy or sell or a solicitation to buy
or sell any securities, investment products or other financial product or
service, an official confirmation of any transaction, or an official statement
of Barclays. Any views or opinions presented are solely those of the author and
do not necessarily represent those of Barclays. This e-mail is subject to terms
available at the following link: www.barcap.com/emaildisclaimer.
with Barclays you consent to the foregoing. Barclays Capital is the investment
banking division of Barclays Bank PLC, a company registered in