Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3C4ABFB92 for ; Mon, 1 Apr 2013 22:12:45 +0000 (UTC) Received: (qmail 3896 invoked by uid 500); 1 Apr 2013 22:12:42 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 3867 invoked by uid 500); 1 Apr 2013 22:12:42 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 3858 invoked by uid 99); 1 Apr 2013 22:12:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Apr 2013 22:12:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gulrich@netflix.com designates 209.85.216.178 as permitted sender) Received: from [209.85.216.178] (HELO mail-qc0-f178.google.com) (209.85.216.178) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Apr 2013 22:12:38 +0000 Received: by mail-qc0-f178.google.com with SMTP id d10so1293298qca.9 for ; Mon, 01 Apr 2013 15:12:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netflix.com; s=google; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=I1Vwk50DjDmbW95lEnI8N/aA0PgLunBbF4JvLpVHBTM=; b=HKy4fFHHCtJa6ljoGoazA2nVHI4D8mIzSTfM9lXuYJ/PIRcyCduj0OjzmLWSst+V6I 05AiFxZVZHD0brF7ip6YYYZrmvc9wI7wlso1bw5ShE9XMjIiVIa0qMngmQ6ydUvAjJZ3 83f4pk9rPinbNam56Viq3ttINx7XTolaUvdY0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=I1Vwk50DjDmbW95lEnI8N/aA0PgLunBbF4JvLpVHBTM=; b=Rjiwg3iG9d6/7QQBbPxMDdwDck3NfZZ199j41Elx3B+4yFoTNThlf4lUFb3AEBDN+a hO5oOlIfe60I46OQTfGxXU2pnGZqVn2ze+G8YDr2jIKF1phsj1cy1PTMjMdyYAF9y0yi V+ZTorU7Cbmwl6n06+9w+FWZTA3NVM14QPYvAKNSP/C/M0jwRBAkN84zXV4WtJewBiRC 5T52uebomG9roVwmtuNxX7DC3xaQjCSj8Ny4cIEutCIAL1ULy/DXnTfdj3ITDZBAe6WY /UcLmVMJIhjJjJtDKm4+l/fCagjgkbd3J8THext0nf2pjLaG5hZbXjHy9/apxmG6XOZn 9Bkg== MIME-Version: 1.0 X-Received: by 10.49.83.162 with SMTP id r2mr15365938qey.7.1364854337397; Mon, 01 Apr 2013 15:12:17 -0700 (PDT) Received: by 10.49.109.40 with HTTP; Mon, 1 Apr 2013 15:12:17 -0700 (PDT) In-Reply-To: References: Date: Mon, 1 Apr 2013 15:12:17 -0700 Message-ID: Subject: Re: how to stop out of control compactions? From: Gregg Ulrich To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b6da3a8f3792704d953e980 X-Gm-Message-State: ALoCoQnN60kCOP/YUc2a5wPabDhOZwov8mmCVK/6gbH9j1we2I3jINWIeFnsGnbmrFct06rHvhIW X-Virus-Checked: Checked by ClamAV on apache.org --047d7b6da3a8f3792704d953e980 Content-Type: text/plain; charset=ISO-8859-1 You may want to set compaction threshold and not throughput. If you set the min threshold to something very large (100000), compactions will not start until cassandra finds this many files to compact (which it should not). In the past I have used this to stop compactions on a node, and then run an offline major compaction to get though the compaction, then set the min threshold back. Not everyone likes major compactions though. setcompactionthreshold - Set the min and max compaction thresholds for a given column family On Mon, Apr 1, 2013 at 12:38 PM, William Oberman wrote: > I'll skip the prelude, but I worked myself into a bit of a jam. I'm > recovering now, but I want to double check if I'm thinking about things > correct. > > Basically, I was in a state where a majority of my servers wanted to do > compactions, and rather large ones. This was impacting my site > performance. I tried nodetool stop COMPACTION. I tried > setcompactionthroughput=1. I tried restarting servers, but they'd restart > the compactions pretty much immediately on boot. > > Then I realized that: > nodetool stop COMPACTION > only stopped running compactions, and then the compactions would > re-enqueue themselves rather quickly. > > So, right now I have: > 1.) scripts running on N-1 servers looping on "nodetool stop COMPACTION" > in a tight loop > 2.) On the "Nth" server I've disabled gossip/thrift and turned up > setcompactionthroughput to 999 > 3.) When the Nth server completes, I pick from the remaining N-1 (well, > I'm still running the first compaction, which is going to take 12 more > hours, but that is the plan at least). > > Does this make sense? Other than the fact there was probably warning > signs that would have prevented me from getting into this state in the > first place? :-) > > will > --047d7b6da3a8f3792704d953e980 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
You may want to set compaction threshold and no= t throughput. =A0If you set the min threshold to something very large (1000= 00), compactions will not start until cassandra finds this many files to co= mpact (which it should not).

In the past I have used this to stop compactions on a node, = and then run an offline major compaction to get though the compaction, then= set the min threshold back. =A0Not everyone likes major compactions though= .



=A0 setcompac= tionthreshold <keyspace> <cfname> <minthreshold> <maxt= hreshold> - Set the min and max=A0
compaction thresholds for a= given column family



On Mon, Apr 1, 2013 at 12:38 PM, William Oberman = <oberman@c= ivicscience.com> wrote:
I'll skip the prelude, = but I worked myself into a bit of a jam. =A0I'm recovering now, but I w= ant to double check if I'm thinking about things correct.

Basically, I was in a state where a majority of my servers w= anted to do compactions, and rather large ones. =A0This was impacting my si= te performance. =A0I tried nodetool stop COMPACTION. =A0I tried setcompacti= onthroughput=3D1. =A0I tried=A0restarting servers, but they'd restart t= he compactions pretty much immediately on boot.

Then I realized that:
nodetool stop COMPACTIO= N
only stopped running compactions, and then the compactions woul= d re-enqueue themselves rather quickly.

So, right now I have:
1.) scripts running on N-1 s= ervers looping on "nodetool stop COMPACTION" in a tight loop
2.) On the "Nth" server I've disabled gossip/thrift and= turned up setcompactionthroughput to 999
3.) When the Nth server completes, I pick from the remaining N-1 (well= , I'm still running the first compaction, which is going to take 12 mor= e hours, but that is the plan at least).

Does this make sense? =A0Other than the fact there was probably warnin= g signs that would have prevented me from getting into this state in the fi= rst place? :-)

will

--047d7b6da3a8f3792704d953e980--