incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Oberman <ober...@civicscience.com>
Subject Re: how to stop out of control compactions?
Date Tue, 02 Apr 2013 15:13:16 GMT
I just tried to use this setting (I'm using 1.1.9).  And it appears I can't
set min > 32, as that's the max max now (using nodetool at least).  Not
sure if JMX would allow more access, but I don't like bypassing things I
don't fully understand.  I think I'll just leave my compaction killers
running instead (not that killing compactions constantly isn't messing with
things as well....).

will


On Tue, Apr 2, 2013 at 10:43 AM, William Oberman
<oberman@civicscience.com>wrote:

> Edward, you make a good point, and I do think am getting closer to having
> to increase my cluster size (I'm around ~300GB/node now).
>
> In my case, I think it was neither.  I had one node OOM after working on a
> large compaction but it continued to run in a zombie like state (constantly
> GC'ing), which I didn't have an alert on.  Then I had the bad luck of a
> "close token" also starting a large compaction.  I have RF=3 with some of
> my R/W patterns at quorum, causing that segment of my cluster to get slow
> (e.g. a % of of my traffic started to slow).  I was running 1.1.2 (I
> haven't had to poke anything for quite some time, obviously), so I upgraded
> before moving on (as I saw a lot of bug fixes to compaction issues in
> release notes).  But the upgrade caused even more nodes to start
> compactions.  Which lead to my original email... I had a cluster where 80%
> of my nodes were compacting, and I really needed to boost production
> traffic and couldn't seem to "tamp cassandra down" temporarily.
>
> Thanks for the advice everyone!
>
> will
>
>
> On Tue, Apr 2, 2013 at 10:20 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
>
>> Settings do not make compactions go away. If your compactions are "out of
>> control" it usually means one of these things,
>> 1)  you have a corrupt table that the compaction never finishes on,
>> sstables count keep growing
>> 2) you do not have enough hardware to handle your write load
>>
>>
>> On Tue, Apr 2, 2013 at 7:50 AM, William Oberman <oberman@civicscience.com
>> > wrote:
>>
>>> Thanks Gregg & Aaron. Missed that setting!
>>>
>>> On Tuesday, April 2, 2013, aaron morton wrote:
>>>
>>>> Set the min and max
>>>> compaction thresholds for a given column family
>>>>
>>>> +1 for setting the max_compaction_threshold (as well as the min) on the
>>>> a CF when you are getting behind. It can limit the size of the compactions
>>>> and give things a chance to complete in a reasonable time.
>>>>
>>>> Cheers
>>>>
>>>>    -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Consultant
>>>> New Zealand
>>>>
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 2/04/2013, at 3:42 AM, Gregg Ulrich <gulrich@netflix.com> wrote:
>>>>
>>>> You may want to set compaction threshold and not throughput.  If you
>>>> set the min threshold to something very large (100000), compactions will
>>>> not start until cassandra finds this many files to compact (which it should
>>>> not).
>>>>
>>>> In the past I have used this to stop compactions on a node, and then
>>>> run an offline major compaction to get though the compaction, then set the
>>>> min threshold back.  Not everyone likes major compactions though.
>>>>
>>>>
>>>>
>>>>   setcompactionthreshold <keyspace> <cfname> <minthreshold>
>>>> <maxthreshold> - Set the min and max
>>>> compaction thresholds for a given column family
>>>>
>>>>
>>>>
>>>> On Mon, Apr 1, 2013 at 12:38 PM, William Oberman <
>>>> oberman@civicscience.com> wrote:
>>>>
>>>>> I'll skip the prelude, but I worked myself into a bit of a jam.  I'm
>>>>> recovering now, but I want to double check if I'm thinking about things
>>>>> correct.
>>>>>
>>>>> Basically, I was in a state where a majority of my servers wanted to
>>>>> do compactions, and rather large ones.  This was impacting my site
>>>>> performance.  I tried nodetool stop COMPACTION.  I tried
>>>>> setcompactionthroughput=1.  I tried restarting servers, but they'd restart
>>>>> the compactions pretty much immediately on boot.
>>>>>
>>>>> Then I realized that:
>>>>> nodetool stop COMPACTION
>>>>> only stopped running compactions, and then the compactions would
>>>>> re-enqueue themselves rather quickly.
>>>>>
>>>>> So, right now I have:
>>>>> 1.) scripts running on N-1 servers looping on "nodetool stop
>>>>> COMPACTION" in a tight loop
>>>>> 2.) On the "Nth" server I've disabled gossip/thrift and turned up
>>>>> setcompactionthroughput to 999
>>>>> 3.) When the Nth server completes, I pick from the remaining N-1
>>>>> (well, I'm still running the first compaction, which is going to take
12
>>>>> more hours, but that is the plan at least).
>>>>>
>>>>> Does this make sense?  Other than the fact there was probably warning
>>>>> signs that would have prevented me from getting into this state in the
>>>>> first place? :-)
>>>>>
>>>>> will
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>

Mime
View raw message