incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedict Elliott Smith <belliottsm...@datastax.com>
Subject Re: Lots of deletions results in death by GC
Date Wed, 05 Feb 2014 15:32:15 GMT
I believe there is a bug, and I have filed a ticket for it:
https://issues.apache.org/jira/browse/CASSANDRA-6655

I will have a patch uploaded shortly, but it's just missed the 2.0.5
release window, so you'll either need to grab the development branch once
it's committed or wait until 2.0.6


On 5 February 2014 15:09, Robert Wille <rwille@fold3.com> wrote:

> Yes. It's kind of an unusual workload. An insertion phase followed by a
> deletion phase, generally not overlapping.
>
> From: Benedict Elliott Smith <belliottsmith@datastax.com>
> Reply-To: <user@cassandra.apache.org>
> Date: Tuesday, February 4, 2014 at 5:29 PM
> To: <user@cassandra.apache.org>
>
> Subject: Re: Lots of deletions results in death by GC
>
> Is it possible you are generating *exclusively* deletes for this table?
>
>
> On 5 February 2014 00:10, Robert Wille <rwille@fold3.com> wrote:
>
>> I ran my test again, and Flush Writer's "All time blocked" increased to 2
>> and then shortly thereafter GC went into its death spiral. I doubled
>> memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and
>> tried again.
>>
>> This time, the table that always sat with Memtable data size = 0 now
>> showed increases in Memtable data size. That was encouraging. It never
>> flushed, which isn't too surprising, because that table has relatively few
>> rows and they are pretty wide. However, on the fourth table to clean, Flush
>> Writer's "All time blocked" went to 1, and then there were no more
>> completed events, and about 10 minutes later GC went into its death spiral.
>> I assume that each time Flush Writer completes an event, that means a table
>> was flushed. Is that right? Also, I got two dropped mutation messages at
>> the same time that Flush Writer's All time blocked incremented.
>>
>> I then increased the writers and queue size to 3 and 12, respectively,
>> and ran my test again. This time All time blocked remained at 0, but I
>> still suffered death by GC.
>>
>> I would almost think that this is caused by high load on the server, but
>> I've never seen CPU utilization go above about two of my eight available
>> cores. If high load triggers this problem, then that is very disconcerting.
>> That means that a CPU spike could permanently cripple a node. Okay, not
>> permanently, but until a manual flush occurs.
>>
>> If anyone has any further thoughts, I'd love to hear them. I'm quite at
>> the end of my rope.
>>
>> Thanks in advance
>>
>> Robert
>>
>> From: Nate McCall <nate@thelastpickle.com>
>> Reply-To: <user@cassandra.apache.org>
>> Date: Saturday, February 1, 2014 at 9:25 AM
>> To: Cassandra Users <user@cassandra.apache.org>
>> Subject: Re: Lots of deletions results in death by GC
>>
>> What's the output of 'nodetool tpstats' while this is happening?
>> Specifically is Flush Writer "All time blocked" increasing? If so, play
>> around with turning up memtable_flush_writers and memtable_flush_queue_size
>> and see if that helps.
>>
>>
>> On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rwille@fold3.com> wrote:
>>
>>> A few days ago I posted about an issue I'm having where GC takes a long
>>> time (20-30 seconds), and it happens repeatedly and basically no work gets
>>> done. I've done further investigation, and I now believe that I know the
>>> cause. If I do a lot of deletes, it creates memory pressure until the
>>> memtables are flushed, but Cassandra doesn't flush them. If I manually
>>> flush, then life is good again (although that takes a very long time
>>> because of the GC issue). If I just leave the flushing to Cassandra, then I
>>> end up with death by GC. I believe that when the memtables are full of
>>> tombstones, Cassadnra doesn't realize how much memory the memtables are
>>> actually taking up, and so it doesn't proactively flush them in order to
>>> free up heap.
>>>
>>> As I was deleting records out of one of my tables, I was watching it via
>>> nodetool cfstats, and I found a very curious thing:
>>>
>>>                 Memtable cell count: 1285
>>>                 Memtable data size, bytes: 0
>>>                 Memtable switch count: 56
>>>
>>> As the deletion process was chugging away, the memtable cell count
>>> increased, as expected, but the data size stayed at 0. No flushing
>>> occurred.
>>>
>>> Here's the schema for this table:
>>>
>>> CREATE TABLE bdn_index_pub (
>>>
>>> tshard VARCHAR,
>>>
>>> pord INT,
>>>
>>> ord INT,
>>>
>>> hpath VARCHAR,
>>>
>>> page BIGINT,
>>>
>>> PRIMARY KEY (tshard, pord)
>>>
>>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>>
>>> I have a few tables that I run this cleaning process on, and not all of
>>> them exhibit this behavior. One of them reported an increasing number of
>>> bytes, as expected, and it also flushed as expected. Here's the schema for
>>> that table:
>>>
>>>
>>> CREATE TABLE bdn_index_child (
>>>
>>> ptshard VARCHAR,
>>>
>>> ord INT,
>>>
>>> hpath VARCHAR,
>>>
>>> PRIMARY KEY (ptshard, ord)
>>>
>>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>>
>>> In both cases, I'm deleting the entire record (i.e. specifying just the
>>> first component of the primary key in the delete statement). Most records
>>> in bdn_index_pub have 10,000 rows per record. bdn_index_child usually has
>>> just a handful of rows, but a few records can have up 10,000.
>>>
>>> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
>>> doesn't seem like nearly enough to create a memory problem. Perhaps there
>>> are other flaws in the memory metering. Or perhaps there is some other
>>> issue that causes Cassandra to mismanage the heap when there are a lot of
>>> deletes. One other thought I had is that I page through these tables and
>>> clean them out as I go. Perhaps there is some interaction between the
>>> paging and the deleting that causes the GC problems and I should create a
>>> list of keys to delete and then delete them after I've finished reading the
>>> entire table.
>>>
>>> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB)
>>> to 1 GB, in hopes that it would force Cassandra to flush tables before I
>>> ran into death by GC, but it didn't seem to help.
>>>
>>> I'm using Cassandra 2.0.4.
>>>
>>> Any insights would be greatly appreciated. I can't be the only one that
>>> has periodic delete-heavy workloads. Hopefully someone else has run into
>>> this and can give advice.
>>>
>>> Thanks
>>>
>>> Robert
>>>
>>
>>
>>
>> --
>> -----------------
>> Nate McCall
>> Austin, TX
>> @zznate
>>
>> Co-Founder & Sr. Technical Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>

Mime
View raw message