incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Oberman <ober...@civicscience.com>
Subject Re: how to stop out of control compactions?
Date Tue, 02 Apr 2013 14:43:08 GMT
Edward, you make a good point, and I do think am getting closer to having
to increase my cluster size (I'm around ~300GB/node now).

In my case, I think it was neither.  I had one node OOM after working on a
large compaction but it continued to run in a zombie like state (constantly
GC'ing), which I didn't have an alert on.  Then I had the bad luck of a
"close token" also starting a large compaction.  I have RF=3 with some of
my R/W patterns at quorum, causing that segment of my cluster to get slow
(e.g. a % of of my traffic started to slow).  I was running 1.1.2 (I
haven't had to poke anything for quite some time, obviously), so I upgraded
before moving on (as I saw a lot of bug fixes to compaction issues in
release notes).  But the upgrade caused even more nodes to start
compactions.  Which lead to my original email... I had a cluster where 80%
of my nodes were compacting, and I really needed to boost production
traffic and couldn't seem to "tamp cassandra down" temporarily.

Thanks for the advice everyone!

will


On Tue, Apr 2, 2013 at 10:20 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> Settings do not make compactions go away. If your compactions are "out of
> control" it usually means one of these things,
> 1)  you have a corrupt table that the compaction never finishes on,
> sstables count keep growing
> 2) you do not have enough hardware to handle your write load
>
>
> On Tue, Apr 2, 2013 at 7:50 AM, William Oberman <oberman@civicscience.com>wrote:
>
>> Thanks Gregg & Aaron. Missed that setting!
>>
>> On Tuesday, April 2, 2013, aaron morton wrote:
>>
>>> Set the min and max
>>> compaction thresholds for a given column family
>>>
>>> +1 for setting the max_compaction_threshold (as well as the min) on the
>>> a CF when you are getting behind. It can limit the size of the compactions
>>> and give things a chance to complete in a reasonable time.
>>>
>>> Cheers
>>>
>>>    -----------------
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 2/04/2013, at 3:42 AM, Gregg Ulrich <gulrich@netflix.com> wrote:
>>>
>>> You may want to set compaction threshold and not throughput.  If you set
>>> the min threshold to something very large (100000), compactions will not
>>> start until cassandra finds this many files to compact (which it should
>>> not).
>>>
>>> In the past I have used this to stop compactions on a node, and then run
>>> an offline major compaction to get though the compaction, then set the min
>>> threshold back.  Not everyone likes major compactions though.
>>>
>>>
>>>
>>>   setcompactionthreshold <keyspace> <cfname> <minthreshold>
>>> <maxthreshold> - Set the min and max
>>> compaction thresholds for a given column family
>>>
>>>
>>>
>>> On Mon, Apr 1, 2013 at 12:38 PM, William Oberman <
>>> oberman@civicscience.com> wrote:
>>>
>>>> I'll skip the prelude, but I worked myself into a bit of a jam.  I'm
>>>> recovering now, but I want to double check if I'm thinking about things
>>>> correct.
>>>>
>>>> Basically, I was in a state where a majority of my servers wanted to do
>>>> compactions, and rather large ones.  This was impacting my site
>>>> performance.  I tried nodetool stop COMPACTION.  I tried
>>>> setcompactionthroughput=1.  I tried restarting servers, but they'd restart
>>>> the compactions pretty much immediately on boot.
>>>>
>>>> Then I realized that:
>>>> nodetool stop COMPACTION
>>>> only stopped running compactions, and then the compactions would
>>>> re-enqueue themselves rather quickly.
>>>>
>>>> So, right now I have:
>>>> 1.) scripts running on N-1 servers looping on "nodetool stop
>>>> COMPACTION" in a tight loop
>>>> 2.) On the "Nth" server I've disabled gossip/thrift and turned up
>>>> setcompactionthroughput to 999
>>>> 3.) When the Nth server completes, I pick from the remaining N-1 (well,
>>>> I'm still running the first compaction, which is going to take 12 more
>>>> hours, but that is the plan at least).
>>>>
>>>> Does this make sense?  Other than the fact there was probably warning
>>>> signs that would have prevented me from getting into this state in the
>>>> first place? :-)
>>>>
>>>> will
>>>>
>>>
>>>
>>>
>>
>> --
>> Will Oberman
>> Civic Science, Inc.
>> 6101 Penn Avenue, Fifth Floor
>> Pittsburgh, PA 15206
>> (M) 412-480-7835
>> (E) oberman@civicscience.com
>>
>
>

Mime
View raw message