Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 478919E98 for ; Tue, 27 Sep 2011 21:17:15 +0000 (UTC) Received: (qmail 92693 invoked by uid 500); 27 Sep 2011 21:17:13 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 92665 invoked by uid 500); 27 Sep 2011 21:17:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 92657 invoked by uid 99); 27 Sep 2011 21:17:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Sep 2011 21:17:13 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [85.158.136.35] (HELO mail125.messagelabs.com) (85.158.136.35) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Sep 2011 21:17:07 +0000 X-Env-Sender: hiroyuki.watanabe@barclayscapital.com X-Msg-Ref: server-2.tower-125.messagelabs.com!1317158204!31093163!1 X-Originating-IP: [146.127.253.26] X-StarScan-Version: 6.3.6; banners=-,-,- X-VirusChecked: Checked Received: (qmail 23149 invoked from network); 27 Sep 2011 21:16:45 -0000 Received: from unknown (HELO mx2071.barcap.com) (146.127.253.26) by server-2.tower-125.messagelabs.com with DHE-RSA-AES256-SHA encrypted SMTP; 27 Sep 2011 21:16:45 -0000 Received: from nykpsmeg0000003.INTRANET.BARCAPINT.COM (nykpsmeg0000003.nyk.mess.barcap.com [10.54.24.3]) by mx2071.barcap.com (Postfix) with ESMTP id 98CD3110012 for ; Tue, 27 Sep 2011 17:16:43 -0400 (EDT) Received: from nykpsmmgch01.INTRANET.BARCAPINT.COM (Not Verified[10.54.113.17]) by nykpsmeg0000003.INTRANET.BARCAPINT.COM with Barclays Capital Filter ESMTP id ; Tue, 27 Sep 2011 17:16:42 -0400 Received: from NYKPCMMGMB01.INTRANET.BARCAPINT.COM ([169.254.1.197]) by nykpsmmgch01.INTRANET.BARCAPINT.COM ([10.54.113.17]) with mapi; Tue, 27 Sep 2011 17:16:35 -0400 From: To: Date: Tue, 27 Sep 2011 17:16:34 -0400 Subject: RE: Removal of old data files Thread-Topic: Removal of old data files Thread-Index: AcxpQ7gcrQwMkU7RTAOxA1NNqRm9BgALEcLABPnQAKA= Message-ID: <84CAA8DB60E39E4EA0F3B4BEB7F88CFE018255F010@NYKPCMMGMB01.INTRANET.BARCAPINT.COM> References: <84CAA8DB60E39E4EA0F3B4BEB7F88CFE01797D23F4@NYKPCMMGMB01.INTRANET.BARCAPINT.COM> <84CAA8DB60E39E4EA0F3B4BEB7F88CFE01797D23F6@NYKPCMMGMB01.INTRANET.BARCAPINT.COM> In-Reply-To: <84CAA8DB60E39E4EA0F3B4BEB7F88CFE01797D23F6@NYKPCMMGMB01.INTRANET.BARCAPINT.COM> Accept-Language: en-US, en-GB Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, en-GB Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Now, we use TTL of 12 hours and GC grace period of 8 hours for encouraging = Cassandra to remove old data/files more aggressively. =20 Cassandra do remove fair amount of old data files.=20 Cassandra tends to removed 4 out of every 5 files.=20 I notice it because data file has a sequence number as a part of name. I also noticed when Cassandra generated *-Compacted file it generated 4 fil= e at a time.=20 They have consecutive numbers as file name, but skip one number from the pr= evious group of 4.=20 The one missing is the file that is failed to be removed in the end and sta= ys forever.=20 I looked at the Keys in an index file that failed to be removed. If I make= query of any of keys, Cassandra indicates that there is not data, which is= correct because these files are older than 24 hours. All the data must be= obsolete due to TTL. =20 =20 I am wondering why Cassandra does not remove all data file whose time stamp= is much older than TTL + grace period.=20 Does anybody have similar experience ?=20 Yuki Watanabe -----Original Message----- From: Watanabe, Hiroyuki: IT (NYK)=20 Sent: Friday, September 02, 2011 9:01 AM To: user@cassandra.apache.org Subject: RE: Removal of old data files =20 I see. Thank you for helpful information=20 Yuki -----Original Message----- From: Sylvain Lebresne [mailto:sylvain@datastax.com] Sent: Friday, September 02, 2011 3:40 AM To: user@cassandra.apache.org Subject: Re: Removal of old data files On Fri, Sep 2, 2011 at 12:11 AM, w= rote: > Yes, I see files with name like > =A0=A0=A0 Orders-g-6517-Compacted > > However, all of those file have a=A0size of 0. > > Starting from=A0Monday to Thurseday=A0we have 5642 files for=A0-Data.db,= =20 > -Filter.db and Statistics.db and only 128 -Compacted files. > and all of=A0-Compacted file has size of 0. > > Is this normal, or we are doing something wrong? You are not doing something wrong. The -Compacted files are just marker, to= indicate that the -Data file corresponding (with the same number) are, in = fact, compacted and will eventually be removed. So those files will always = have a size of 0. -- Sylvain > > > yuki > > ________________________________ > From: aaron morton [mailto:aaron@thelastpickle.com] > Sent: Thursday, August 25, 2011 6:13 PM > To: user@cassandra.apache.org > Subject: Re: Removal of old data files > > If cassandra does not have enough disk space to create a new file it=20 > will provoke a JVM GC which should result in compacted SStables that=20 > are no longer needed been deleted. Otherwise they are deleted at some=20 > time in the future. > Compacted SSTables have a file written out with a "compacted" extension. > Do you see compacted sstables in the data directory? > Cheers. > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > On 26/08/2011, at 2:29 AM, yuki watanabe wrote: > > We are using Cassandra 0.8.0 with 8 node ring and only one CF. > Every column has TTL of 86400 (24 hours). we also set 'GC grace=20 > second' to 43200 > (12 hours). =A0We have to store massive amount of data for one day now=20 > and eventually for five days if we get more disk space. > Even for one day, we do run out disk space in a busy day. > > We run nodetool compact command at night or as necessary then we run=20 > GC from jconsole. We observed that =A0GC did remove files but not=20 > necessarily oldest ones. > Data files from more than 36 hours ago and quite often three days ago=20 > are still there. > > Does this behavior expected or we need adjust some other parameters? > > > Yuki Watanabe > > _______________________________________________ > > > > This e-mail may contain information that is confidential, privileged=20 > or otherwise protected from disclosure. If you are not an intended=20 > recipient of this e-mail, do not duplicate or redistribute it by any=20 > means. Please delete it and any attachments and notify the sender that=20 > you have received it in error. Unless specifically indicated, this=20 > e-mail is not an offer to buy or sell or a solicitation to buy or sell=20 > any securities, investment products or other financial product or=20 > service, an official confirmation of any transaction, or an official=20 > statement of Barclays. Any views or opinions presented are solely=20 > those of the author and do not necessarily represent those of=20 > Barclays. This e-mail is subject to terms available at the following > link: www.barcap.com/emaildisclaimer. By messaging with Barclays you=20 > consent to the foregoing.=A0 Barclays Capital is the investment banking=20 > division of Barclays Bank PLC, a company registered in England (number > 1026167) with its registered office at 1 Churchill Place, London, E14 5HP= . > This email may relate to or be sent from other members of the Barclays=20 > Group. > > _______________________________________________