incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Dzielak <j...@keen.io>
Subject Re: Notes and questions from performing a large delete
Date Sat, 07 Dec 2013 19:58:19 GMT
Thanks Nate. I hadn't noticed that and it definitely explains it.

It'd be nice to see that called out much more clearly. As we found out the implications can
be severe!

-Josh 


On Thursday, December 5, 2013 at 11:30 AM, Nate McCall wrote:

> Per the 256mb to 5mb change, check the very last section of this page:
> http://www.datastax.com/documentation/cql/3.0/webhelp/cql/cql_reference/alter_table_r.html
> 
> "Changing any compaction or compression option erases all previous compaction or compression
settings."
> 
> In other words, you have to include the whole 'WITH' clause each time - in the future
just grab the output from 'show schema' and add/modify as needed. 
> 
> I did not know this either until it happened to me as well - could probably stand to
be a little bit more front-and-center, IMO. 
> 
> 
> On Wed, Dec 4, 2013 at 2:59 PM, Josh Dzielak <josh@keen.io (mailto:josh@keen.io)>
wrote:
> > We recently had a little Cassandra party I wanted to share and see if anyone has
notes to compare. Or can tell us what we did wrong or what we could do better. :) Apologies
in advance for the length of the narrative here. 
> > 
> > Task at hand: Delete about 50% of the rows in a large column family (~8TB) to reclaim
some disk. These are rows are used only for intermediate storage.
> > 
> > Sequence of events: 
> > 
> > - Issue the actual deletes. This, obviously, was super-fast.
> > - Nothing happens yet, which makes sense. New tombstones are not immediately compacted
b/c of gc_grace_seconds.
> > - Adjust gc_grace_seconds down to 60 from 86400 using ALTER TABLE in CQL.
> > 
> > - Every node started working very hard. We saw disk space start to free up. It was
exciting.
> > - Eventually the compactions finished and we had gotten a ton of disk back. 
> > - However, our SSTables were now 5Mb, not 256Mb as they had always been :(
> > - We inspected the schema in CQL/Opscenter etc and sure enough sstable_size_in_mb
had changed to 5Mb for this CF. Previously all CFs were set at 256Mb, and all other CF's still
were.
> > 
> > - At 5Mb we had a huge number of SSTables. Our next goal was to get these tables
back to 256Mb. 
> > - First step was to update the schema back to 256Mb.
> > - Figuring out how to do this in CQL was tricky, because CQL has gone through a
lot of changes recently and getting the docs for your version is hard. Eventually we figured
it out - ALTER TABLE events WITH compaction={'class':'LeveledCompactionStrategy','sstable_size_in_mb':256};
> > - Out of our 12 nodes, 9 acknowledged the update. The others showed the old schema
still.
> > - The remaining 3 would not. There was no extra load was on the systems, operational
status was very clean. All nodes could see each other.
> > - For each of the remaining 3 we tried to update the schema through a local cqlsh
session. The same ALTER TABLE would just hang forever.
> > - We restarted Cassandra on each of the 3 nodes, then did the ALTER TABLE again.
It worked this time. We finally had schema agreement.
> > 
> > - Starting with just 1 node, we kicked off upgradesstables, hoping it would rebuild
the 5Mb tables to 256Mb tables.
> > - Nothing happened. This was (afaik) because the sstable size change doesn't represent
a new version of schema for the sstables. So existing tables are ignored.
> > - We discovered the "-a" option for upgradesstables, which tells it to skip the
schema check just and just do all the tables anyway.
> > - We ran upgradesstables -a and things started happening. After a few hours the
pending compactions finished.
> > - Sadly, this node was now using 3x the disk it previously had. Some sstables were
now 256Mb, but not all. There were tens of thousands of ~20Mb tables.
> > - A direct comparison to other nodes owning the same % of the ring showed both the
same number of sstables and the same ratio of 256Mb+ tables to small tables. However, on a
'normal' node the small tables were all 5-6Mb and on the fat, upgraded node, all the tables
were 20Mb+. This was why the fat node was taking up 3x disk overall.
> > - I tried to see what was in those 20Mb files relative to the 5Mb ones but sstable2json
failed against our authenticated keyspace. I filed a bug (https://issues.apache.org/jira/browse/CASSANDRA-6450).

> > - Had little choice here. We shut down the fat node, did a manual delete of sstables,
brought it back up and did a repair. It came back to the right size.
> > 
> > TL;DR / Our big questions are: 
> > How could the schema have spontaneously changed from 256Mb sstable_size_in_mb to
5Mb?
> > How could schema propagation failed such that only 9 of 12 nodes got the change
even when cluster was healthy? Why did updating schema locally hang until restart?
> > What could have happened inside of upgradesstables that created the node with the
same ring % but 3x disk load?
> > 
> > We're on Cassandra 1.2.8, Java 6, Ubuntu 12. Running on SSD's, 12 node cluster across
2 DCs. No compression, leveled compaction. Happy to provide more details. Thanks in advance
for any insights into what happened or any best practices we missed during this episode. 
> > 
> > Best,
> > Josh
> > 
> 
> 
> 
> 
> 
> -- 
> -----------------
> Nate McCall
> Austin, TX
> @zznate
> 
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com 


Mime
View raw message