Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 049BF2BB0 for ; Sat, 7 May 2011 17:21:17 +0000 (UTC) Received: (qmail 11050 invoked by uid 500); 7 May 2011 17:21:14 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 11020 invoked by uid 500); 7 May 2011 17:21:14 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 11012 invoked by uid 99); 7 May 2011 17:21:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 May 2011 17:21:14 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tmarthinussen@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 May 2011 17:21:10 +0000 Received: by wyb29 with SMTP id 29so3535255wyb.31 for ; Sat, 07 May 2011 10:20:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=6DA7p6dlX9yRueFPEgtom73WjGnAbdpYSuI+XdpJgBA=; b=Gzmh5jw4npewX1kzhUWHFA8f/FYyDwDT87z0ThxO0xWshnWRQtujjkpO8jrQ+QdIcI O5POKjl3lXAIsZ6JPTtLGD6KZGdTM+qDYqVAiW+uTt1GgWuihpKHnWNIZ9QT4am84cwv 66qGZiE5hSe9iFTBnkdkcAe9BTTuJCoGPukXc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=wzBRyilZtvWI60fXOwajXNcJ6CPDALIFIm5zQ6S0bjDe3mwoO6uHYL529rqonPGnLL 2mLM0Gw6MeS7W3OKQzvuSSRoaIbbLLHKVEn1Sxd7KvdoFUQED+bVE/e6/kIyNUUSWIkp Z9GgwXC/0yPlYvLEeTxZlt/Z82t3cyKtI9nEQ= MIME-Version: 1.0 Received: by 10.216.15.137 with SMTP id f9mr2659455wef.62.1304788848497; Sat, 07 May 2011 10:20:48 -0700 (PDT) Received: by 10.216.48.145 with HTTP; Sat, 7 May 2011 10:20:48 -0700 (PDT) In-Reply-To: References: Date: Sun, 8 May 2011 02:20:48 +0900 Message-ID: Subject: Re: compaction strategy From: Terje Marthinussen To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00151773dbf6d2556e04a2b2d3e6 --00151773dbf6d2556e04a2b2d3e6 Content-Type: text/plain; charset=ISO-8859-1 This is an all ssd system. I have no problems with read/write performance due to I/O. I do have a potential with the crazy explosion you can get in terms of disk use if compaction cannot keep up. As things falls behind and you get many generations of data, yes, read performance gets a problem due to the number of sstables. As things start falling behind, you have a bunch of minor compactions trying to merge 20MB (sstables cassandra generally dumps with current config when under pressure) into 40 MB into 80MB into.... Anyone wants to do the math on how many times you are rewriting the data going this route? There is just no way this can keep up. It will just fall more and more behind. Only way to recover as I can see would be to trigger a full compaction? It does not really make sense to me to go through all these minor merges when a full compaction will do a much faster and better job. Terje On Sat, May 7, 2011 at 9:54 PM, Jonathan Ellis wrote: > On Sat, May 7, 2011 at 2:01 AM, Terje Marthinussen > wrote: > > 1. Would it make sense to make full compactions occur a bit more > aggressive. > > I'd rather reduce the performance impact of being behind, than do more > full compactions: https://issues.apache.org/jira/browse/CASSANDRA-2498 > > > 2. I > > would think the code should be smart enough to either trigger a full > > compaction and scrap the current queue, or at least merge some of those > > pending tasks into larger ones > > Not crazy but a queue-rewriter would be nontrivial. For now I'm okay > with saying "add capacity until compaction can mostly keep up." (Most > people's problem is making compaction LESS aggressive, hence > https://issues.apache.org/jira/browse/CASSANDRA-2156.) > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > --00151773dbf6d2556e04a2b2d3e6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable This is an all ssd system. I have no problems with read/write performance d= ue to I/O.
I do have a potential with the crazy explosion you can get i= n terms of disk use if compaction cannot keep up.

As things falls be= hind and you get many generations of data, yes, read performance gets a pro= blem due to the number of sstables.

As things start falling behind, you have a bunch of minor compactions t= rying to merge 20MB (sstables cassandra generally dumps with current config= when under pressure) into 40 MB into 80MB into....

Anyone wants to = do the math on how many times you are rewriting the data going this route?<= br>
There is just no way this can keep up. It will just fall more and more = behind.
Only way to recover as I can see would be to trigger a full comp= action?

It does not really make sense to me to go through all these = minor merges when a full compaction will do a much faster and better job.
Terje

On Sat, May 7, 2011 at 9:54 PM,= Jonathan Ellis <= jbellis@gmail.com> wrote:
On Sat, May 7, 2011 at 2:01 AM, Terje Marthinussen
<tmarthinussen@gmail.com&= gt; wrote:
> 1. Would it make sense to make full compactions occur a bit more aggre= ssive.

I'd rather reduce the performance impact of being behind, than do= more
full compactions: https://issues.apache.org/jira/browse/CASSANDRA-2= 498

> 2. I
> would think the code should be smart enough to eithe= r trigger a full
> compaction and scrap the current queue, or at least merge some of thos= e
> pending tasks into larger ones

Not crazy but a queue-rewriter would be nontrivial. For now I'm o= kay
with saying "add capacity until compaction can mostly keep up." (= Most
people's problem is making compaction LESS aggressive, hence
https://issues.apache.org/jira/browse/CASSANDRA-2156.)

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.c= om

--00151773dbf6d2556e04a2b2d3e6--