incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedict Elliott Smith <belliottsm...@datastax.com>
Subject Re: Lots of deletions results in death by GC
Date Wed, 05 Feb 2014 00:29:20 GMT
Is it possible you are generating *exclusively* deletes for this table?


On 5 February 2014 00:10, Robert Wille <rwille@fold3.com> wrote:

> I ran my test again, and Flush Writer's "All time blocked" increased to 2
> and then shortly thereafter GC went into its death spiral. I doubled
> memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and
> tried again.
>
> This time, the table that always sat with Memtable data size = 0 now
> showed increases in Memtable data size. That was encouraging. It never
> flushed, which isn't too surprising, because that table has relatively few
> rows and they are pretty wide. However, on the fourth table to clean, Flush
> Writer's "All time blocked" went to 1, and then there were no more
> completed events, and about 10 minutes later GC went into its death spiral.
> I assume that each time Flush Writer completes an event, that means a table
> was flushed. Is that right? Also, I got two dropped mutation messages at
> the same time that Flush Writer's All time blocked incremented.
>
> I then increased the writers and queue size to 3 and 12, respectively, and
> ran my test again. This time All time blocked remained at 0, but I still
> suffered death by GC.
>
> I would almost think that this is caused by high load on the server, but
> I've never seen CPU utilization go above about two of my eight available
> cores. If high load triggers this problem, then that is very disconcerting.
> That means that a CPU spike could permanently cripple a node. Okay, not
> permanently, but until a manual flush occurs.
>
> If anyone has any further thoughts, I'd love to hear them. I'm quite at
> the end of my rope.
>
> Thanks in advance
>
> Robert
>
> From: Nate McCall <nate@thelastpickle.com>
> Reply-To: <user@cassandra.apache.org>
> Date: Saturday, February 1, 2014 at 9:25 AM
> To: Cassandra Users <user@cassandra.apache.org>
> Subject: Re: Lots of deletions results in death by GC
>
> What's the output of 'nodetool tpstats' while this is happening?
> Specifically is Flush Writer "All time blocked" increasing? If so, play
> around with turning up memtable_flush_writers and memtable_flush_queue_size
> and see if that helps.
>
>
> On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rwille@fold3.com> wrote:
>
>> A few days ago I posted about an issue I'm having where GC takes a long
>> time (20-30 seconds), and it happens repeatedly and basically no work gets
>> done. I've done further investigation, and I now believe that I know the
>> cause. If I do a lot of deletes, it creates memory pressure until the
>> memtables are flushed, but Cassandra doesn't flush them. If I manually
>> flush, then life is good again (although that takes a very long time
>> because of the GC issue). If I just leave the flushing to Cassandra, then I
>> end up with death by GC. I believe that when the memtables are full of
>> tombstones, Cassadnra doesn't realize how much memory the memtables are
>> actually taking up, and so it doesn't proactively flush them in order to
>> free up heap.
>>
>> As I was deleting records out of one of my tables, I was watching it via
>> nodetool cfstats, and I found a very curious thing:
>>
>>                 Memtable cell count: 1285
>>                 Memtable data size, bytes: 0
>>                 Memtable switch count: 56
>>
>> As the deletion process was chugging away, the memtable cell count
>> increased, as expected, but the data size stayed at 0. No flushing
>> occurred.
>>
>> Here's the schema for this table:
>>
>> CREATE TABLE bdn_index_pub (
>>
>> tshard VARCHAR,
>>
>> pord INT,
>>
>> ord INT,
>>
>> hpath VARCHAR,
>>
>> page BIGINT,
>>
>> PRIMARY KEY (tshard, pord)
>>
>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>
>> I have a few tables that I run this cleaning process on, and not all of
>> them exhibit this behavior. One of them reported an increasing number of
>> bytes, as expected, and it also flushed as expected. Here's the schema for
>> that table:
>>
>>
>> CREATE TABLE bdn_index_child (
>>
>> ptshard VARCHAR,
>>
>> ord INT,
>>
>> hpath VARCHAR,
>>
>> PRIMARY KEY (ptshard, ord)
>>
>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>
>> In both cases, I'm deleting the entire record (i.e. specifying just the
>> first component of the primary key in the delete statement). Most records
>> in bdn_index_pub have 10,000 rows per record. bdn_index_child usually has
>> just a handful of rows, but a few records can have up 10,000.
>>
>> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
>> doesn't seem like nearly enough to create a memory problem. Perhaps there
>> are other flaws in the memory metering. Or perhaps there is some other
>> issue that causes Cassandra to mismanage the heap when there are a lot of
>> deletes. One other thought I had is that I page through these tables and
>> clean them out as I go. Perhaps there is some interaction between the
>> paging and the deleting that causes the GC problems and I should create a
>> list of keys to delete and then delete them after I've finished reading the
>> entire table.
>>
>> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB)
>> to 1 GB, in hopes that it would force Cassandra to flush tables before I
>> ran into death by GC, but it didn't seem to help.
>>
>> I'm using Cassandra 2.0.4.
>>
>> Any insights would be greatly appreciated. I can't be the only one that
>> has periodic delete-heavy workloads. Hopefully someone else has run into
>> this and can give advice.
>>
>> Thanks
>>
>> Robert
>>
>
>
>
> --
> -----------------
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Mime
View raw message