Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B9018181D5 for ; Mon, 28 Sep 2015 15:36:47 +0000 (UTC) Received: (qmail 23774 invoked by uid 500); 28 Sep 2015 15:36:38 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 23731 invoked by uid 500); 28 Sep 2015 15:36:38 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 23721 invoked by uid 99); 28 Sep 2015 15:36:38 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Sep 2015 15:36:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2FFF6C6AB3 for ; Mon, 28 Sep 2015 15:36:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.092 X-Spam-Level: **** X-Spam-Status: No, score=4.092 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, MIME_QP_LONG_LINE=0.001, SPF_PASS=-0.001, TRACKER_ID=1.102, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ecMwzoA91cHh for ; Mon, 28 Sep 2015 15:36:36 +0000 (UTC) Received: from mail.crowdstrike.com (dragos.crowdstrike.com [208.42.226.9]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 81034204C9 for ; Mon, 28 Sep 2015 15:36:36 +0000 (UTC) Received: from casmbox01.crowdstrike.sys (10.100.11.66) by ee01.crowdstrike.sys (10.100.0.12) with Microsoft SMTP Server (TLS) id 15.0.847.32; Mon, 28 Sep 2015 08:36:10 -0700 Received: from casmbox01.crowdstrike.sys (10.100.11.66) by casmbox01.crowdstrike.sys (10.100.11.66) with Microsoft SMTP Server (TLS) id 15.0.847.32; Mon, 28 Sep 2015 08:36:11 -0700 Received: from casmbox01.crowdstrike.sys ([fe80::9509:3711:75cb:b49f]) by casmbox01.crowdstrike.sys ([fe80::9509:3711:75cb:b49f%12]) with mapi id 15.00.0847.030; Mon, 28 Sep 2015 08:36:11 -0700 From: Jeff Jirsa To: "user@cassandra.apache.org" , Dongfeng Lu Subject: Re: How to remove huge files with all expired data sooner? Thread-Topic: How to remove huge files with all expired data sooner? Thread-Index: AQHQ98IsMCjqeohE5U+uOTAVOuRTNp5R/D4AgAAbCIA= Date: Mon, 28 Sep 2015 15:36:10 +0000 Message-ID: <16BD0D61-3DEC-4CF4-8A40-861725B54825@crowdstrike.com> References: <18764766.1032585.1443206442578.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-originating-ip: [10.100.0.9] Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha256; boundary="B_3526274169_1521555754" MIME-Version: 1.0 --B_3526274169_1521555754 Content-type: multipart/alternative; boundary="B_3526274169_1049902757" --B_3526274169_1049902757 Content-type: text/plain; charset="UTF-8" Content-transfer-encoding: quoted-printable There=E2=80=99s a seldom discussed parameter called: unchecked_tombstone_compaction The documentation describes the option as follows: True enables more aggressive than normal tombstone compactions. A single SS= Table tombstone compaction runs without checking the likelihood of success. = Cassandra 2.0.9 and later. You=E2=80=99d need to upgrade to newer than 2.0.9, but by doing so, and enabling = unchecked_tombstone_compaction, you could encourage cassandra to compact jus= t one single large sstable to purge tombstones. From: on behalf of Erick Ramirez Reply-To: "user@cassandra.apache.org" Date: Sunday, September 27, 2015 at 11:59 PM To: "user@cassandra.apache.org", Dongfeng Lu Subject: Re: How to remove huge files with all expired data sooner? Hello,=20 You should never run `nodetool compact` since this will result in a massive= SSTable that will almost never get compacted out or take a very long time t= o get compacted out. You are correct that there needs to be 4 similar-sized SSTables for them to= get compacted. If you want the expired data to be deleted quicker, try lowe= ring the STCS `min_threshold` to 3 or even 2. Good luck! Cheers, Erick=20 On Sat, Sep 26, 2015 at 4:40 AM, Dongfeng Lu wrote: Hi I have a table where I set TTL to only 7 days for all records and we kee= p pumping records in every day. In general, I would expect all data files fo= r that table to have timestamps less than, say 8 or 9 days old, giving the s= ystem some time to work its magic. However, I see some files more than 9 day= s old occationally. Last Friday, I saw 4 large files, each about 10G in size= , with timestamps about 5, 4, 3, 2 weeks old. Interestingly they are all gon= e this Monday, leaving 1 new file 9 GB in size. The compaction strategy is SizeTieredCompactionStrategy, and I can understa= nd why the above happened. It seems we have 10G of data every week and when = SizeTieredCompactionStrategy works to create various tiers, it just happened= the file size for the next tier is 10G, and all the data is packed into thi= s huge file. Then it starts the next cycle. Another week goes by, and anothe= r 10G file is created. This process continues until the minimum number of fi= les of the same size is reached, which I think is 4 by default. Then it star= ted to compact this set of 4 10G files. At this time, all data in these 4 fi= les have expired so we end up with nothing or much smaller file if there is = still some records with TTL left. I have many tables like this, and I'd like to reclaim those spaces sooner. = What would be the best way to do it? Should I run "nodetool compact" when I = see two large files that are 2 weeks old? Is there configuration parameters = I can tune to achieve the same effect? I looked through all the CQL Compacti= on Subproperties for STCS, but I am not sure how they can help here. Any sug= gestion is welcome. BTW, I am using Cassandra 2.0.6. --B_3526274169_1049902757 Content-type: text/html; charset="UTF-8" Content-transfer-encoding: quoted-printable
There’s a seldom d= iscussed parameter called:

unchecked_tombstone_comp= action

The documentation describes the option as fo= llows:

True enables more agg= ressive than normal tombstone compactions. A single SSTable tombstone compac= tion runs without checking the likelihood of success. Cassandra 2.0.9 and la= ter.

You’d need to upgrade to = newer than 2.0.9, but by doing so, and enabling unchecked_tombstone_compacti= on, you could encourage cassandra to compact just one single large sstable t= o purge tombstones.



From: <erickramirezonline@gmail.com> on behalf of Erick RamirezReply-To: "user@cassandra.apache.org"
Date: Sunday, September 27, 2015 at 11:59 PM
To: "us= er@cassandra.apache.org", Dongfeng Lu
= Subject: Re: How to remove huge files with all expired data sooner?<= br>

Hello,

You should never run `nodetool compact` since this will= result in a massive SSTable that will almost never get compacted out or tak= e a very long time to get compacted out.

You are co= rrect that there needs to be 4 similar-sized SSTables for them to get compac= ted. If you want the expired data to be deleted quicker, try lowering the ST= CS `min_threshold` to 3 or even 2. Good luck!
Cheers,
Erick


On= Sat, Sep 26, 2015 at 4:40 AM, Dongfeng Lu <dlu66061@yahoo.com<= /a>> wrote:
Hi I have a ta= ble where I set TTL to only 7 days for all records and we keep pumping recor= ds in every day. In general, I would expect all data files for that table to= have timestamps less than, say 8 or 9 days old, giving the system some time= to work its magic. However, I see some files more than 9 days old occation= ally. Last Friday, I saw 4 large files, each about 10G in size, with timesta= mps about 5, 4, 3, 2 weeks old. Interestingly they are all gone this Monday,= leaving 1 new file 9 GB in size.

The compaction strategy is SizeTieredCompactionStrategy, and I can understa= nd why the above happened. It seems we have 10G of data every week and when = SizeTieredCompactionStrategy works to create various tiers, it just happened= the file size for the next tier is 10G, and all the data is packed into this huge file. Then it starts the= next cycle. Another week goes by, and another 10G file is created. This pro= cess continues until the minimum number of files of the same size is reached= , which I think is 4 by default. Then it started to compact this set of 4 10G files. At this time, all data= in these 4 files have expired so we end up with nothing or much smaller fil= e if there is still some records with TTL left.

I have many tables like this, and I'd like to reclaim those spaces sooner. = What would be the best way to do it? Should I run "nodetool compact" when I = see two large files that are 2 weeks old? Is there configuration parameters = I can tune to achieve the same effect? I looked through all the CQL Compaction Subproperties for STCS, bu= t I am not sure how they can help here. Any suggestion is welcome.

BTW, I am using Cassandra 2.0.6.
--B_3526274169_1049902757-- --B_3526274169_1521555754 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" MIISYAYJKoZIhvcNAQcCoIISUTCCEk0CAQExDzANBglghkgBZQMEAgEFADALBgkqhkiG9w0B BwGggg8wMIIJlTCCB32gAwIBAgIKJGjoIQAAAAACYjANBgkqhkiG9w0BAQsFADBQMRMwEQYK CZImiZPyLGQBGRYDc3lzMRswGQYKCZImiZPyLGQBGRYLY3Jvd2RzdHJpa2UxHDAaBgNVBAMT E2Nyb3dkc3RyaWtlLVJPT1QtQ0EwHhcNMTUwNTE4MTcwNTU4WhcNMjMwNTE2MTcwNTU4WjCB nTETMBEGCgmSJomT8ixkARkWA3N5czEbMBkGCgmSJomT8ixkARkWC2Nyb3dkc3RyaWtlMRAw DgYDVQQLEwdDU1VzZXJzMRQwEgYDVQQLEwtFbmdpbmVlcmluZzEWMBQGA1UEAxMNSmVmZiBS LiBKaXJzYTEpMCcGCSqGSIb3DQEJARYaSmVmZi5KaXJzYUBjcm93ZHN0cmlrZS5jb20wggIi MA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQDz/juPunY45nC7cetdyhafX455PB0ps4zm rZ/jR7NZfNWK11qq4CsWu9Agifcx7KlsmWSjn1EMopttM8axhSpgZOEKaiKSUI7NkihHKcpd ITBmx3XyhU2Dj5wBoftLbp54W+6AC5NYmsrlsqb1sw9CbRQRiALR+amfwzeZZOTKIpl2MYUJ 0qGkikpcl1wGQ1pXZYM4Bi3fa36IACgzgosONucxKMF9uOX8MUYaxdcU2wOpxvh3P4xw/CHO SQJsTaijjLDz+cZ7osYZmBlLdbHN2A55JxzMYlbNEb7xuOp7e3ooV8TV5I7LaD6ewPqu3B9Y nJnXoF+rNuf14D1tZWnT6BUBHIxNk1OTloOihowRaItAsSCUoY+KkRSFPWfwEOUEiynoSCUa gwyqIov+QP/KsGUf1J22yajmH9zexvkqoLohN56qhrAa+v3fZF0UyGk5V+EKeK4zxfu7tFwH Y9KJOPIl9jN5EBRiEpbe+j+6w8FO6+hOf3Pmr2R/IbQl9AP+saDqHGrCJrOUHZnNx+9YZ1pU TvUd/3qgQODgO28z/XqXmqqXefiwiqT9Ubmoiz7T//u069y3zquTD3PKoRnJLJuV4UIfnEdP HBIpClo9OQ2iz3CPGen+eTZprtaOlM926uweNTwDdsrjX3KiV2+bqLoLcAT2lzAFv1pSHp7O kQIDAQABo4IEITCCBB0wDgYDVR0PAQH/BAQDAgWgMD4GCSsGAQQBgjcVBwQxMC8GJysGAQQB gjcVCITk0CKH/Pdpg9mDAoHlqxCG+PJlgW6Gi7JugryReAIBZAIBFTCBlAYJKoZIhvcNAQkP BIGGMIGDMA4GCCqGSIb3DQMCAgIAgDAOBggqhkiG9w0DBAICAIAwBwYFKw4DAgcwCgYIKoZI hvcNAwcwCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBLTALBglghkgBZQMEARYwCwYJYIZIAWUD BAEZMAsGCWCGSAFlAwQBAjALBglghkgBZQMEAQUwHQYDVR0OBBYEFFeyMg1s94zEmeUrFyUT 1gtu/8mqMB8GA1UdIwQYMBaAFAnL7V4xGc1Nowmrlmwbj20X638YMIIBEwYDVR0fBIIBCjCC AQYwggECoIH/oIH8hoG6bGRhcDovLy9DTj1jcm93ZHN0cmlrZS1ST09ULUNBLENOPWRjMSxD Tj1DRFAsQ049UHVibGljJTIwS2V5JTIwU2VydmljZXMsQ049U2VydmljZXMsQ049Q29uZmln dXJhdGlvbixEQz1jcm93ZHN0cmlrZSxEQz1zeXM/Y2VydGlmaWNhdGVSZXZvY2F0aW9uTGlz dD9iYXNlP29iamVjdENsYXNzPWNSTERpc3RyaWJ1dGlvblBvaW50hj1odHRwOi8vZGMxLmNy b3dkc3RyaWtlLnN5cy9DZXJ0RW5yb2xsL2Nyb3dkc3RyaWtlLVJPT1QtQ0EuY3JsMIIBKgYI KwYBBQUHAQEEggEcMIIBGDCBtgYIKwYBBQUHMAKGgalsZGFwOi8vL0NOPWNyb3dkc3RyaWtl LVJPT1QtQ0EsQ049QUlBLENOPVB1YmxpYyUyMEtleSUyMFNlcnZpY2VzLENOPVNlcnZpY2Vz LENOPUNvbmZpZ3VyYXRpb24sREM9Y3Jvd2RzdHJpa2UsREM9c3lzP2NBQ2VydGlmaWNhdGU/ YmFzZT9vYmplY3RDbGFzcz1jZXJ0aWZpY2F0aW9uQXV0aG9yaXR5MF0GCCsGAQUFBzABhlFo dHRwOi8vZGMxLmNyb3dkc3RyaWtlLnN5cy9DZXJ0RW5yb2xsL2RjMS5jcm93ZHN0cmlrZS5z eXNfY3Jvd2RzdHJpa2UtUk9PVC1DQS5jcnQwKQYDVR0lBCIwIAYIKwYBBQUHAwIGCCsGAQUF BwMEBgorBgEEAYI3CgMEMDUGCSsGAQQBgjcVCgQoMCYwCgYIKwYBBQUHAwIwCgYIKwYBBQUH AwQwDAYKKwYBBAGCNwoDBDBNBgNVHREERjBEoCYGCisGAQQBgjcUAgOgGAwWamppcnNhQGNy b3dkc3RyaWtlLnN5c4EaSmVmZi5KaXJzYUBjcm93ZHN0cmlrZS5jb20wDQYJKoZIhvcNAQEL BQADggIBAIso1nTlfRcS3oWoWZ6gtjY0AH91GZpWft+O2kWxDUqzrmfmF+9swJXrk462v/m9 KAUxuch4mLC7j4tQ4FyNV64FZe+tU1fcNlg4wLIOjSoMykjx85sFbzh64YIpbpiX+8dqc/pF h6YcUU4PUgz7CZ0Q79J6bomV0EP94QUTN1AYLAYQ4xaPyLkO2DCtTiQ5Kef8jnNnrEYzZT62 OdljvhdGdV/VDMHAPr4yRPGgRou4Gf0cWsQNCav0RMEHPyJsgpXFCacLCCpvonXQoLMnClYT CVf7fWXdR+UtEpjssNMO2icJXNLarC7ngON7nzsqJKs40eKeAlKHlKXC620fCDbn6Icyodwd w32rkZDann2NAbrf8y2ArrXObRky1h/t8LDhkz/Yvvd5ndsKfQfciyCwIKBIcIgodR4MWobN qJzFIV5P/H/QGM8QLBdOwqEgwfqJjovgosxDXjb/aLbEuyCxgSTjZP6yv+90MFfs8ojV+0o2 Ir3q/H24u42nCD4gpGtb/+X16O1FV65QYlDPQyOomLnxBiuRji9BrazPvU4yIsHpJsZ5iTvG AvddQXQM4P7fNh+3esHBbvEwadPTOpUi/IfjQFsdDDaQw2QX6TjD5qT9vKimX2BpZmJIPvhi 9xXnmae7etpkBJ1Xrc8ysgpBRcZ0aKI4nyYmtGqNoov9MIIFkzCCA3ugAwIBAgIQHHI69IfK ebRHYrS5//qdkTANBgkqhkiG9w0BAQsFADBQMRMwEQYKCZImiZPyLGQBGRYDc3lzMRswGQYK CZImiZPyLGQBGRYLY3Jvd2RzdHJpa2UxHDAaBgNVBAMTE2Nyb3dkc3RyaWtlLVJPT1QtQ0Ew HhcNMTIwMzE4MDA1NjU4WhcNMzIwMzE4MDEwNjU0WjBQMRMwEQYKCZImiZPyLGQBGRYDc3lz MRswGQYKCZImiZPyLGQBGRYLY3Jvd2RzdHJpa2UxHDAaBgNVBAMTE2Nyb3dkc3RyaWtlLVJP T1QtQ0EwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQCqrDGWg7do3ZjG04CVNLL6 voCUzmip/CbqyuCBAF3XNHjLsQnIhqQ+4sYkvozZVvR0CG3rvSe4YbSzOcHTsmK5bPiatuXB iUBO8rwHm5tzMUnjGk+XomeOGJzwemYsGIDJupLekvSrcm9dXX4H+GK97J42EkM8UAzsAGwn 7qpWo/bPrn7AM8kdua7DVbZbeJXjoid1NzFG4gjwdmtOtcwBx3DCs7io7C/R5Aep5APq4nmT gA7/whOpkxtThZCqiG8QprBRAktnmS2gMn3Kw1O0amPHtJr/O6FldgoK6Dkb9hKDftVFX9Te D7AlOVaZfltscXksnPxFBP0A6Bjw92hqR6QzRDqcWO+v1kswkd01jRrPPc410scWN7vX2QHv kCku8APiSIOGAvaLS/EtzcmjeJhCgkC6HVt5gjXXJBOMq9C2g82s8UiUnhj7D5/loaZffTyY 5y9mnZaxMbJavjmRV/vj0/Uuy+r1Z7WqIZ/0EhttmE+404YX4lAX/FtFjK0Ns5tpQgSxoIy2 ZUPTWmWsvwSiTud3Ek16IqxbPoQejuVhYnggjonpShvOeXAGcq+PMeQOIDoYl2fjlLTvUUvR LOvDse0F1nzo+6/OF+rr+h5iD2H4IAn0f/dasUIi77rU1mWMWWNTmRGP/7itZ2OhKakvxrtM sdbI4aj8AJPLnQIDAQABo2kwZzATBgkrBgEEAYI3FAIEBh4EAEMAQTAOBgNVHQ8BAf8EBAMC AYYwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQUCcvtXjEZzU2jCauWbBuPbRfrfxgwEAYJ KwYBBAGCNxUBBAMCAQAwDQYJKoZIhvcNAQELBQADggIBADP4q4k1ZlMnD9FoC9nanmJsEli3 suYxE3VKFMNMF/g+YTv60ETh3k9op70DUPnowyphLgFAoI2ZmqVyaij5gYxUr9OGYHVUJecG oc6dYuQQUb6D6QDOJW+kUC/bSTJI5ICOGCQobSRuFPJKswbcq7QxA008+bq/S93zoH/afBgG VUMqGt/kiwvBra+ClRMu8m5RlpBmkLLy76zcznsWFaJNKnU4N0sWQLrrNh5jiaOT8MVmcoS5 OWfoy3Bp9zHEpT1zVrFvVDdl9+EIFwOAlfcg1jBIp037XxyxN526VUfowmJa86a+VnQVZH92 33y8qMjU2dGhqdXDFT8iHXbnFh3L6uxEjU6w2n90FXtuBaIvMlVKAOLqQrTkUC/FbTRVsCSh MzzpgEybnc4kcG0LdcEp4dh3NIArj5N37Tqh7pl2i7WBOfQ2ya4mz7f+G0bv1jImPgDK2I3F zd5f8uMvuCKbxMuDWdpaVRG8M+pwaVGV+b6nr5uoFno9QWk0VYz5035Yi/S0Bv2xT4jI/DR0 jT/A18fDQPGsp4JEk9/4XL0cndYi1mQSKnoKfNkKJGfv/dwLCNP3yEW9rHBbF5LSOe3apq/X NFH/7ZIkzRBkjngcXffbaClZdad8QALtQAqafYe7PnzNLyjbfpP4u6w2+9YW+yYx90SiSks/ oHPkRSBuMYIC9DCCAvACAQEwXjBQMRMwEQYKCZImiZPyLGQBGRYDc3lzMRswGQYKCZImiZPy LGQBGRYLY3Jvd2RzdHJpa2UxHDAaBgNVBAMTE2Nyb3dkc3RyaWtlLVJPT1QtQ0ECCiRo6CEA AAAAAmIwDQYJYIZIAWUDBAIBBQCgaTAvBgkqhkiG9w0BCQQxIgQgN/c+uXORhus5PKegNH+x BfFheRWB3gZXoh9ruuClLL0wGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0B CQUxDxcNMTUwOTI4MTUzNjA5WjANBgkqhkiG9w0BAQEFAASCAgAY+eoN1GqqFSZM2JVll2dB /YEv1TaqJoY+UVvpnd2y4A7YvDZZ0twPy7ONJC8SePAe/5VM1BlppmhAQ0RX8J8JfTD+VuAQ lllhNCsSrBMJbGEoHOMhcjPUlqEoKIEUuqgPzdSh+x2RwHRVceUdyYj/+bKAIZjdG30U0Cz4 a5PbvOelR+OzSw9i4vujW33sJUm2DZlY00vH8kzaciulBwBeHCv83xhhhx9MAXqwdoEkJJOV jyET4NUCyD6XLhyz1FjQVgliGqxYiPQlO+Z3UF8hgqrVKmbaO6riKDk0F4YneanbpLN8o74z uF8i2uYso/MS0bELYyMZ6mTCB+r/NOqwCh93bKY7SUln+Itj+od0LGvotX6/ovjN+l6gdzxD qaIEAkDDqOhIf6FLgJFb7XpFJ52qljhm3IzIy5xwm2YvtdX+1UngBEpq7KsPQr+ojfYGkxfU OCVHwQi6p6yVH9Kw0TYAmwo7YpsfcKW0m/pThcLN+YReMYkf0lRy3t8MR0Kp/y38mVIt5sTr 6qsleJSm3yFi73Ocgvrznd71dOMYbg3DcPUjzy1EMO8puJwFhpvQejIxAwU4KD2uVkkbDkQd 0jXG0r2Nr//oA9K25OvcfiP3Z0N3aTrBSKQfH2nm9RgP1mt/t88nKP8wtjn3juQxQCMz/7MU vtOmd4IVLbTTVg== --B_3526274169_1521555754--