From user-return-37273-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon Oct 28 06:47:44 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 084051097A for ; Mon, 28 Oct 2013 06:47:44 +0000 (UTC) Received: (qmail 72275 invoked by uid 500); 28 Oct 2013 06:47:41 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 72088 invoked by uid 500); 28 Oct 2013 06:47:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 72080 invoked by uid 99); 28 Oct 2013 06:47:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Oct 2013 06:47:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.160.44] (HELO mail-pb0-f44.google.com) (209.85.160.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Oct 2013 06:47:36 +0000 Received: by mail-pb0-f44.google.com with SMTP id rp16so1333503pbb.17 for ; Sun, 27 Oct 2013 23:47:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:message-id:mime-version :subject:date:references:to:in-reply-to; bh=PsNil4wsnd0nuMetGJ0mNbiAZgvA0eW4LfdNghJujW8=; b=HiY52n1G19r2ut8/p+97Hqde1R0po3Upt/4ITnyJhus9/RidqzdycdSeMyfm2GUR49 dfgmjOaBoKjZ/d9dyLyvCjMWprm+E7pnkuZmvw5tLVRJegf0WZKbGYZ0VpNXP2tG+Yzx AsFaDkhHfXzx/PemhBilSt0IBhX/88xWyAwDv8xZywUwQszaN/3pW0P9HeANMg/hg3uF mPYKfEEms2JKuoHkMzFc1ooviAi5YuKfLxnzvekNbbsaISJyaRajFPvcCEE+xbaSGMoo TNEEo8pXKhkXL2boGiZk8GFqUkDLXs6oP9yoz/7Ec8NEfeE7Q3pEATVn6m78W8GRtI3m 3tfg== X-Gm-Message-State: ALoCoQkCIQ4p9YPEmnD9P2nGsXOhWysW5y+fGTEdIzFpDPJmoJKdQ7Yq8xmtO/gMUTO5wa6Z66cr X-Received: by 10.68.190.229 with SMTP id gt5mr334980pbc.177.1382942835501; Sun, 27 Oct 2013 23:47:15 -0700 (PDT) Received: from [172.16.1.20] ([203.86.207.101]) by mx.google.com with ESMTPSA id at4sm26230307pbc.30.2013.10.27.23.47.12 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 27 Oct 2013 23:47:13 -0700 (PDT) From: Aaron Morton Content-Type: multipart/alternative; boundary="Apple-Mail=_6140A765-175A-45C8-A942-4ED803FD5080" Message-Id: <95435668-B3DA-4D9F-919C-7B042305D8DA@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 7.0 \(1816\)) Subject: Re: Cassandra SSTable deletion/load reporting question Date: Mon, 28 Oct 2013 19:47:09 +1300 References: To: Cassandra User In-Reply-To: X-Mailer: Apple Mail (2.1816) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_6140A765-175A-45C8-A942-4ED803FD5080 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > 1.2 w/ vnodes using LeveledCompactionStrategy, using 128 mb SSTables. If you are using LCS the amount of overwritten / deleted data left will = be small.=20 Your row will be present in only 1 sstable per level. The number of = levels is included in the output from nodetool cfstats on the sstable = count line. It shows the number of sstables at each level.=20 If you really want to know which sstables contain your row use either = sstable2json or sstablekeys.=20 Cheers ----------------- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 26/10/2013, at 9:20 am, Jasdeep Hundal wrote: > Thanks Rob. >=20 > Will checkout the tool you linked to. In our case it's definitely not = the tombstones hanging around since we write entire rows at once and the = amount of data in a row is far, far greater than the space a tombstone = takes. >=20 > Jasdeep >=20 >=20 > On Fri, Oct 25, 2013 at 1:14 PM, Robert Coli = wrote: > On Fri, Oct 25, 2013 at 1:10 PM, Jasdeep Hundal = wrote: >=20 > After performing a large set of deletes on our cluster, a few hundred = gigabytes work (essentially cleaning out nearly all old data), we = noticed that nodetool reported about the same load as before. >=20 > Tombstones are purgeable only after gc_grace_seconds has elapsed, and = only if all SSTables which contain fragments of that row are involved in = the compaction. > =20 > With my understanding, running a repair should have triggered = compactions between SSTable files and reference counting on the = subsequent restart of cassandra on a node should have cleared the old = files, but this did not appear to happen. The load did not start going = down until we were writing to the cluster again. >=20 > Repair is unrelated to minor compaction, except in that it creates new = SSTables via streaming, which may trigger minor compaction. > =20 > I suspect that there are a few values hanging around in the old tables = so the references stayed intact, can anyone confirm this? >=20 > Stop suspecting and measure with checksstablegarbage : = https://github.com/cloudian/support-tools > =20 > What's a bit more important for me is being able to accurately report = the size of the "active" data set, since nodetool doesn't seem to be = useful for this. I use counters for reporting some of this, but is there = a single source of truth for this, especially since counters do = occasionally miss updates. >=20 > In very new versions of Cassandra, there is tracking of and metrics = available for what percentage of data in a SSTable is expired. >=20 > =3DRob > =20 >=20 --Apple-Mail=_6140A765-175A-45C8-A942-4ED803FD5080 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=iso-8859-1
1.2 w/ vnodes using LeveledCompactionStrategy, using 128 mb SSTables.
If you are using LCS the amount of overwritten / deleted data left will be small. 

Your row will be present in only 1 sstable per level. The number of levels is included in the output from nodetool cfstats on the sstable count line. It shows the number of sstables at each level. 

If you really want to know which sstables contain your row use either sstable2json or sstablekeys. 

Cheers

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting

On 26/10/2013, at 9:20 am, Jasdeep Hundal <dsjas297@gmail.com> wrote:

Thanks Rob.

Will checkout the tool you linked to. In our case it's definitely not the tombstones hanging around since we write entire rows at once and the amount of data in a row is far, far greater than the space a tombstone takes.

Jasdeep


On Fri, Oct 25, 2013 at 1:14 PM, Robert Coli <rcoli@eventbrite.com> wrote:
On Fri, Oct 25, 2013 at 1:10 PM, Jasdeep Hundal <dsjas297@gmail.com> wrote:

After performing a large set of deletes on our cluster, a few hundred gigabytes work (essentially cleaning out nearly all old data), we noticed that nodetool reported about the same load as before.

Tombstones are purgeable only after gc_grace_seconds has elapsed, and only if all SSTables which contain fragments of that row are involved in the compaction.
 
With my understanding, running a repair should have triggered compactions between SSTable files and reference counting on the subsequent restart of cassandra on a node should have cleared the old files, but this did not appear to happen. The load did not start going down until we were writing to the cluster again.

Repair is unrelated to minor compaction, except in that it creates new SSTables via streaming, which may trigger minor compaction.
 
I suspect that there are a few values hanging around in the old tables so the references stayed intact, can anyone confirm this?

Stop suspecting and measure with checksstablegarbage : https://github.com/cloudian/support-tools
 
What's a bit more important for me is being able to accurately report the size of the "active" data set, since nodetool doesn't seem to be useful for this. I use counters for reporting some of this, but is there a single source of truth for this, especially since counters do occasionally miss updates.

In very new versions of Cassandra, there is tracking of and metrics available for what percentage of data in a SSTable is expired.

=Rob
 


--Apple-Mail=_6140A765-175A-45C8-A942-4ED803FD5080--