incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Wille <rwi...@fold3.com>
Subject Re: Lots of deletions results in death by GC
Date Wed, 05 Feb 2014 15:09:30 GMT
Yes. It¹s kind of an unusual workload. An insertion phase followed by a
deletion phase, generally not overlapping.

From:  Benedict Elliott Smith <belliottsmith@datastax.com>
Reply-To:  <user@cassandra.apache.org>
Date:  Tuesday, February 4, 2014 at 5:29 PM
To:  <user@cassandra.apache.org>
Subject:  Re: Lots of deletions results in death by GC

Is it possible you are generating exclusively deletes for this table?


On 5 February 2014 00:10, Robert Wille <rwille@fold3.com> wrote:
> I ran my test again, and Flush Writer¹s ³All time blocked² increased to 2 and
> then shortly thereafter GC went into its death spiral. I doubled
> memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and tried
> again.
> 
> This time, the table that always sat with Memtable data size = 0 now showed
> increases in Memtable data size. That was encouraging. It never flushed, which
> isn¹t too surprising, because that table has relatively few rows and they are
> pretty wide. However, on the fourth table to clean, Flush Writer¹s ³All time
> blocked² went to 1, and then there were no more completed events, and about 10
> minutes later GC went into its death spiral. I assume that each time Flush
> Writer completes an event, that means a table was flushed. Is that right?
> Also, I got two dropped mutation messages at the same time that Flush Writer¹s
> All time blocked incremented.
> 
> I then increased the writers and queue size to 3 and 12, respectively, and ran
> my test again. This time All time blocked remained at 0, but I still suffered
> death by GC.
> 
> I would almost think that this is caused by high load on the server, but I¹ve
> never seen CPU utilization go above about two of my eight available cores. If
> high load triggers this problem, then that is very disconcerting. That means
> that a CPU spike could permanently cripple a node. Okay, not permanently, but
> until a manual flush occurs.
> 
> If anyone has any further thoughts, I¹d love to hear them. I¹m quite at the
> end of my rope.
> 
> Thanks in advance
> 
> Robert
> 
> From:  Nate McCall <nate@thelastpickle.com>
> Reply-To:  <user@cassandra.apache.org>
> Date:  Saturday, February 1, 2014 at 9:25 AM
> To:  Cassandra Users <user@cassandra.apache.org>
> Subject:  Re: Lots of deletions results in death by GC
> 
> What's the output of 'nodetool tpstats' while this is happening? Specifically
> is Flush Writer "All time blocked" increasing? If so, play around with turning
> up memtable_flush_writers and memtable_flush_queue_size and see if that helps.
> 
> 
> On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rwille@fold3.com> wrote:
>> A few days ago I posted about an issue I¹m having where GC takes a long time
>> (20-30 seconds), and it happens repeatedly and basically no work gets done.
>> I¹ve done further investigation, and I now believe that I know the cause. If
>> I do a lot of deletes, it creates memory pressure until the memtables are
>> flushed, but Cassandra doesn¹t flush them. If I manually flush, then life is
>> good again (although that takes a very long time because of the GC issue). If
>> I just leave the flushing to Cassandra, then I end up with death by GC. I
>> believe that when the memtables are full of tombstones, Cassadnra doesn¹t
>> realize how much memory the memtables are actually taking up, and so it
>> doesn¹t proactively flush them in order to free up heap.
>> 
>> As I was deleting records out of one of my tables, I was watching it via
>> nodetool cfstats, and I found a very curious thing:
>> 
>>                 Memtable cell count: 1285
>>                 Memtable data size, bytes: 0
>>                 Memtable switch count: 56
>> 
>> As the deletion process was chugging away, the memtable cell count increased,
>> as expected, but the data size stayed at 0. No flushing occurred.
>> 
>> Here¹s the schema for this table:
>> 
>> CREATE TABLE bdn_index_pub (
>> 
>> tshard VARCHAR,
>> 
>> pord INT,
>> 
>> ord INT,
>> 
>> hpath VARCHAR,
>> 
>> page BIGINT,
>> 
>> PRIMARY KEY (tshard, pord)
>> 
>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>> 
>> 
>> I have a few tables that I run this cleaning process on, and not all of them
>> exhibit this behavior. One of them reported an increasing number of bytes, as
>> expected, and it also flushed as expected. Here¹s the schema for that table:
>> 
>> 
>> CREATE TABLE bdn_index_child (
>> 
>> ptshard VARCHAR,
>> 
>> ord INT,
>> 
>> hpath VARCHAR,
>> 
>> PRIMARY KEY (ptshard, ord)
>> 
>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>> 
>> 
>> In both cases, I¹m deleting the entire record (i.e. specifying just the first
>> component of the primary key in the delete statement). Most records in
>> bdn_index_pub have 10,000 rows per record. bdn_index_child usually has just a
>> handful of rows, but a few records can have up 10,000.
>> 
>> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
>> doesn¹t seem like nearly enough to create a memory problem. Perhaps there are
>> other flaws in the memory metering. Or perhaps there is some other issue that
>> causes Cassandra to mismanage the heap when there are a lot of deletes. One
>> other thought I had is that I page through these tables and clean them out as
>> I go. Perhaps there is some interaction between the paging and the deleting
>> that causes the GC problems and I should create a list of keys to delete and
>> then delete them after I¹ve finished reading the entire table.
>> 
>> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB) to 1
>> GB, in hopes that it would force Cassandra to flush tables before I ran into
>> death by GC, but it didn¹t seem to help.
>> 
>> I¹m using Cassandra 2.0.4.
>> 
>> Any insights would be greatly appreciated. I can¹t be the only one that has
>> periodic delete-heavy workloads. Hopefully someone else has run into this and
>> can give advice.
>> 
>> Thanks
>> 
>> Robert
> 
> 
> 
> -- 
> -----------------
> Nate McCall
> Austin, TX
> @zznate
> 
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com




Mime
View raw message