incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedict Elliott Smith <belliottsm...@datastax.com>
Subject Re: Lots of deletions results in death by GC
Date Wed, 05 Feb 2014 22:40:32 GMT
You should find that the patch will apply cleanly to the 2.0.5 release, so
you could apply it yourself.


On 5 February 2014 18:56, Robert Wille <rwille@fold3.com> wrote:

> Thank you so much. Everything I had seen pointed to this being the case.
> I'm glad that someone in the know has confirmed this bug and fixed it. Now
> I just need to figure out where to go from here: do I wait, use the dev
> branch or work around.
>
> Robert
>
> From: Benedict Elliott Smith <belliottsmith@datastax.com>
> Reply-To: <user@cassandra.apache.org>
> Date: Wednesday, February 5, 2014 at 8:32 AM
>
> To: <user@cassandra.apache.org>
> Subject: Re: Lots of deletions results in death by GC
>
> I believe there is a bug, and I have filed a ticket for it:
> https://issues.apache.org/jira/browse/CASSANDRA-6655
>
> I will have a patch uploaded shortly, but it's just missed the 2.0.5
> release window, so you'll either need to grab the development branch once
> it's committed or wait until 2.0.6
>
>
> On 5 February 2014 15:09, Robert Wille <rwille@fold3.com> wrote:
>
>> Yes. It's kind of an unusual workload. An insertion phase followed by a
>> deletion phase, generally not overlapping.
>>
>> From: Benedict Elliott Smith <belliottsmith@datastax.com>
>> Reply-To: <user@cassandra.apache.org>
>> Date: Tuesday, February 4, 2014 at 5:29 PM
>> To: <user@cassandra.apache.org>
>>
>> Subject: Re: Lots of deletions results in death by GC
>>
>> Is it possible you are generating *exclusively* deletes for this table?
>>
>>
>> On 5 February 2014 00:10, Robert Wille <rwille@fold3.com> wrote:
>>
>>> I ran my test again, and Flush Writer's "All time blocked" increased to
>>> 2 and then shortly thereafter GC went into its death spiral. I doubled
>>> memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and
>>> tried again.
>>>
>>> This time, the table that always sat with Memtable data size = 0 now
>>> showed increases in Memtable data size. That was encouraging. It never
>>> flushed, which isn't too surprising, because that table has relatively few
>>> rows and they are pretty wide. However, on the fourth table to clean, Flush
>>> Writer's "All time blocked" went to 1, and then there were no more
>>> completed events, and about 10 minutes later GC went into its death spiral.
>>> I assume that each time Flush Writer completes an event, that means a table
>>> was flushed. Is that right? Also, I got two dropped mutation messages at
>>> the same time that Flush Writer's All time blocked incremented.
>>>
>>> I then increased the writers and queue size to 3 and 12, respectively,
>>> and ran my test again. This time All time blocked remained at 0, but I
>>> still suffered death by GC.
>>>
>>> I would almost think that this is caused by high load on the server, but
>>> I've never seen CPU utilization go above about two of my eight available
>>> cores. If high load triggers this problem, then that is very disconcerting.
>>> That means that a CPU spike could permanently cripple a node. Okay, not
>>> permanently, but until a manual flush occurs.
>>>
>>> If anyone has any further thoughts, I'd love to hear them. I'm quite at
>>> the end of my rope.
>>>
>>> Thanks in advance
>>>
>>> Robert
>>>
>>> From: Nate McCall <nate@thelastpickle.com>
>>> Reply-To: <user@cassandra.apache.org>
>>> Date: Saturday, February 1, 2014 at 9:25 AM
>>> To: Cassandra Users <user@cassandra.apache.org>
>>> Subject: Re: Lots of deletions results in death by GC
>>>
>>> What's the output of 'nodetool tpstats' while this is happening?
>>> Specifically is Flush Writer "All time blocked" increasing? If so, play
>>> around with turning up memtable_flush_writers and memtable_flush_queue_size
>>> and see if that helps.
>>>
>>>
>>> On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille <rwille@fold3.com> wrote:
>>>
>>>> A few days ago I posted about an issue I'm having where GC takes a long
>>>> time (20-30 seconds), and it happens repeatedly and basically no work gets
>>>> done. I've done further investigation, and I now believe that I know the
>>>> cause. If I do a lot of deletes, it creates memory pressure until the
>>>> memtables are flushed, but Cassandra doesn't flush them. If I manually
>>>> flush, then life is good again (although that takes a very long time
>>>> because of the GC issue). If I just leave the flushing to Cassandra, then
I
>>>> end up with death by GC. I believe that when the memtables are full of
>>>> tombstones, Cassadnra doesn't realize how much memory the memtables are
>>>> actually taking up, and so it doesn't proactively flush them in order to
>>>> free up heap.
>>>>
>>>> As I was deleting records out of one of my tables, I was watching it
>>>> via nodetool cfstats, and I found a very curious thing:
>>>>
>>>>                 Memtable cell count: 1285
>>>>                 Memtable data size, bytes: 0
>>>>                 Memtable switch count: 56
>>>>
>>>> As the deletion process was chugging away, the memtable cell count
>>>> increased, as expected, but the data size stayed at 0. No flushing
>>>> occurred.
>>>>
>>>> Here's the schema for this table:
>>>>
>>>> CREATE TABLE bdn_index_pub (
>>>>
>>>> tshard VARCHAR,
>>>>
>>>> pord INT,
>>>>
>>>> ord INT,
>>>>
>>>> hpath VARCHAR,
>>>>
>>>> page BIGINT,
>>>>
>>>> PRIMARY KEY (tshard, pord)
>>>>
>>>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>>>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>>>
>>>> I have a few tables that I run this cleaning process on, and not all of
>>>> them exhibit this behavior. One of them reported an increasing number of
>>>> bytes, as expected, and it also flushed as expected. Here's the schema for
>>>> that table:
>>>>
>>>>
>>>> CREATE TABLE bdn_index_child (
>>>>
>>>> ptshard VARCHAR,
>>>>
>>>> ord INT,
>>>>
>>>> hpath VARCHAR,
>>>>
>>>> PRIMARY KEY (ptshard, ord)
>>>>
>>>> ) WITH gc_grace_seconds = 0 AND compaction = { 'class' :
>>>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
>>>>
>>>> In both cases, I'm deleting the entire record (i.e. specifying just the
>>>> first component of the primary key in the delete statement). Most records
>>>> in bdn_index_pub have 10,000 rows per record. bdn_index_child usually has
>>>> just a handful of rows, but a few records can have up 10,000.
>>>>
>>>> Still a further mystery, 1285 tombstones in the bdn_index_pub memtable
>>>> doesn't seem like nearly enough to create a memory problem. Perhaps there
>>>> are other flaws in the memory metering. Or perhaps there is some other
>>>> issue that causes Cassandra to mismanage the heap when there are a lot of
>>>> deletes. One other thought I had is that I page through these tables and
>>>> clean them out as I go. Perhaps there is some interaction between the
>>>> paging and the deleting that causes the GC problems and I should create a
>>>> list of keys to delete and then delete them after I've finished reading the
>>>> entire table.
>>>>
>>>> I reduced memtable_total_space_in_mb from the default (probably 2.7 GB)
>>>> to 1 GB, in hopes that it would force Cassandra to flush tables before I
>>>> ran into death by GC, but it didn't seem to help.
>>>>
>>>> I'm using Cassandra 2.0.4.
>>>>
>>>> Any insights would be greatly appreciated. I can't be the only one that
>>>> has periodic delete-heavy workloads. Hopefully someone else has run into
>>>> this and can give advice.
>>>>
>>>> Thanks
>>>>
>>>> Robert
>>>>
>>>
>>>
>>>
>>> --
>>> -----------------
>>> Nate McCall
>>> Austin, TX
>>> @zznate
>>>
>>> Co-Founder & Sr. Technical Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>>
>

Mime
View raw message