incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: Notes and questions from performing a large delete
Date Sun, 08 Dec 2013 01:16:20 GMT
It is definately unexpected and can be very impactful to reset such
impprtant settings.

On Saturday, December 7, 2013, Josh Dzielak <> wrote:
> Thanks Nate. I hadn't noticed that and it definitely explains it.
> It'd be nice to see that called out much more clearly. As we found out
the implications can be severe!
> -Josh
> On Thursday, December 5, 2013 at 11:30 AM, Nate McCall wrote:
> Per the 256mb to 5mb change, check the very last section of this page:
> "Changing any compaction or compression option erases all previous
compaction or compression settings."
> In other words, you have to include the whole 'WITH' clause each time -
in the future just grab the output from 'show schema' and add/modify as
> I did not know this either until it happened to me as well - could
probably stand to be a little bit more front-and-center, IMO.
> On Wed, Dec 4, 2013 at 2:59 PM, Josh Dzielak <> wrote:
> We recently had a little Cassandra party I wanted to share and see if
anyone has notes to compare. Or can tell us what we did wrong or what we
could do better. :) Apologies in advance for the length of the narrative
> Task at hand: Delete about 50% of the rows in a large column family
(~8TB) to reclaim some disk. These are rows are used only for intermediate
> Sequence of events:
> - Issue the actual deletes. This, obviously, was super-fast.
> - Nothing happens yet, which makes sense. New tombstones are not
immediately compacted b/c of gc_grace_seconds.
> - Adjust gc_grace_seconds down to 60 from 86400 using ALTER TABLE in CQL.
> - Every node started working very hard. We saw disk space start to free
up. It was exciting.
> - Eventually the compactions finished and we had gotten a ton of disk
> - However, our SSTables were now 5Mb, not 256Mb as they had always been :(
> - We inspected the schema in CQL/Opscenter etc and sure enough
sstable_size_in_mb had changed to 5Mb for this CF. Previously all CFs were
set at 256Mb, and all other CF's still were.
> - At 5Mb we had a huge number of SSTables. Our next goal was to get these
tables back to 256Mb.
> - First step was to update the schema back to 256Mb.
> - Figuring out how to do this in CQL was tricky, because CQL has gone
through a lot of changes recently and getting the docs for your version is
hard. Eventually we figured it out - ALTER TABLE events WITH
> - Out of our 12 nodes, 9 acknowledged the update. The others showed the
old schema still.
> - The remaining 3 would not. There was no extra load was on the systems,
operational status was very clean. All nodes could see each other.
> - For each of the remaining 3 we tried to update the schema through a
local cqlsh session. The same ALTER TABLE would just hang forever.
> - We restarted Cassandra on each of the 3 nodes, then did the ALTER TABLE
again. It worked this time. We finally had schema agreement.
> - Starting with just 1 node, we kicked off upgradesstables, hoping it
would rebuild the 5Mb tables to 256Mb tables.
> - Nothing happened. This was (afaik) because the sstable size change
doesn't represent a new version of schema for the sstables. So existing
tables are ignored.
> - We discovered the "-a" option for upgradesstables, which tells it to
skip the schema check just and just do all the tables anyway.
> - We ran upgradesstables -a and things started happening. After a few
hours the pending compactions finished.
> - Sadly, this node was now using 3x the disk it previously had. Some
sstables were now 256Mb, but not all. There were tens of thousands of ~20Mb
> - A direct comparison to other nodes owning the same % of the ring showed
both the same number of sstables and the same ratio of 256Mb+ tables to
small tables. However, on a 'normal' node the small tables were all 5-6Mb
and on the fat, upgraded node, all the tables were 20Mb+. This was why the
fat node was taking up 3x disk overall.
> - I tried to see what was in those 20Mb files relative to the 5Mb ones
but sstable2json failed against our authenticated keyspace. I filed a bug.
> - Had little choice here. We shut down the fat node, did a manual delete
of sstables, br

Sorry this was sent from mobile. Will do less grammar and spell check than

View raw message