Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AE3B27F2C for ; Tue, 15 Nov 2011 16:44:38 +0000 (UTC) Received: (qmail 11586 invoked by uid 500); 15 Nov 2011 16:44:36 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 11554 invoked by uid 500); 15 Nov 2011 16:44:36 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 11546 invoked by uid 99); 15 Nov 2011 16:44:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Nov 2011 16:44:36 +0000 X-ASF-Spam-Status: No, hits=4.0 required=5.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dan.hendry.junk@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vw0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Nov 2011 16:44:30 +0000 Received: by vws5 with SMTP id 5so7744107vws.31 for ; Tue, 15 Nov 2011 08:44:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:to:references:in-reply-to:subject:date:message-id:mime-version :content-type:x-mailer:thread-index:content-language; bh=gnZNHH+Xj9djpWqR5jzaOpD5jEr0U3awuvOWzWcndac=; b=asLNOaRp4HEq3CRv59TaM7YIwuKyWCnmYbhXOriAJOWvXfbEPUXNojBQLwacrRl0F0 fgUQwovMlWtY2mQkUM5vgDXEt8AU34g27Mlg0C1dyUzLks5nbovDcU7eG3pJuElw85C2 id0+YqMbDZZTOPOqS8n7hJ4yd6dJ0Vsdohfgw= Received: by 10.52.67.144 with SMTP id n16mr43232313vdt.108.1321375449328; Tue, 15 Nov 2011 08:44:09 -0800 (PST) Received: from DHTABLET ([216.16.242.198]) by mx.google.com with ESMTPS id r5sm37679372vdj.11.2011.11.15.08.44.07 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 15 Nov 2011 08:44:08 -0800 (PST) From: "Dan Hendry" To: References: In-Reply-To: Subject: RE: Compaction -> CPU load 100% -> time out Date: Tue, 15 Nov 2011 11:43:51 -0500 Message-ID: <4ec296d8.a524340a.6000.09df@mx.google.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_003B_01CCA38B.D9790370" X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Acyjipi5k9iifR/ZS7yDwa4Vz9f/qgAJN2iQ Content-Language: en-ca This is a multi-part message in MIME format. ------=_NextPart_000_003B_01CCA38B.D9790370 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit I really don't recommend using t1.micros. The problem with them is that they have CPU bursting, basically meaning you get lots of CPU resources for a short time but if you use more than you have been allocated you get basically nothing for 10+ seconds afterwards. By 'basically nothing' I really mean that - the machine is effectively dead. The biggest problem with this (which we found out the hard way, within a test environment thankfully) is that it makes capacity planning extremely difficult - the line between having a cluster with sufficient capacity and being overloaded is extremely abrupt and very difficult to see coming. Moreover once you are over capacity, the 'dead periods caused' by CPU bursting cause things spiral out of control rapidly due to overtly aggressive client retries and hinted handoff increasing overall load (although the HH problem might have improved with 1.0.x). I would recommend m1.smalls at the very least. If you are set on micros, make sure you only ever trigger compaction on one node at a time (or better, consider if you even need to trigger major compactions at all), set compaction_throughput_mb_per_sec (cassandra.yaml) as low as you possibly can (1 is the minimum I believe), try disabling hinted handoff (on all nodes), and use lower read/write consistency levels if you can. Dan From: Alain RODRIGUEZ [mailto:arodrime@gmail.com] Sent: November-15-11 6:34 To: user@cassandra.apache.org Subject: Compaction -> CPU load 100% -> time out Hi, I'm running a 3 node cassandra 1.0.2 cluster on 3 Amazon EC2 t1.micro. I managed to fix some OOM I had, but I still have some spike of cpu load. I know that t1.micro have small resources, but I think it could be enough if they were well managed. My application works well, excepted when cassandra need to run a compaction on a node. To do it, Cassandra uses 100% of the cpu, generating a lot of time out. My time out is configured to 250 ms with 2 attempt max. I'm running in production, our actual system use MySQL and we are trying to replace MySQLwith Cassandra. Cassandra musn't slow down the production environnement while we use both DB in parallel, that is why I can't increase the time before a time out. Running this compaction in background somehow could be a good idea, after my seach about this subject, I tried by adding JVM_OPTS="$JVM_OPTS -Dcassandra.compaction.priority=1" to the cassandra-env.sh This option was added for Cassandra 0.6.3, is it still usefull ? It doesn't resolve my problem. Anyways, this doesn't help while performing a nodetool repair, the cpu load is still 100%. Is there a way to turn these exceptional tasks into backgrounds tasks, using only available cpu ? Is there a way to get Cassandra working properly on EC2 t1.micros ? Thanks, Alain No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.920 / Virus Database: 271.1.1/4017 - Release Date: 11/14/11 14:34:00 ------=_NextPart_000_003B_01CCA38B.D9790370 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

I really don’t recommend using t1.micros. The problem with them = is that they have CPU bursting, basically meaning you get lots of CPU = resources for a short time but if you use more than you have been = allocated you get basically nothing for 10+ seconds afterwards. By = ‘basically nothing’ I really mean that – the machine = is effectively dead. The biggest problem with this (which we found out = the hard way, within a test environment thankfully) is that it makes = capacity planning extremely difficult – the line between having a = cluster with sufficient capacity and being overloaded is extremely = abrupt and very difficult to see coming. Moreover once you are over = capacity, the ‘dead periods caused’ by CPU bursting cause = things spiral out of control rapidly due to overtly aggressive client = retries and hinted handoff increasing overall load (although the HH = problem might have improved with 1.0.x). I would recommend m1.smalls at = the very least.

 

If you are set on micros, make sure you only ever trigger compaction = on one node at a time (or better, consider if you even need to trigger = major compactions at all), set compaction_throughput_mb_per_sec = (cassandra.yaml) as low as you possibly can (1 is the minimum I = believe), try disabling hinted handoff (on all nodes), and use lower = read/write consistency levels if you can.

 

Dan

 

From:= Alain = RODRIGUEZ [mailto:arodrime@gmail.com]
Sent: November-15-11 = 6:34
To: user@cassandra.apache.org
Subject: = Compaction -> CPU load 100% -> time = out

 

Hi, I'm = running a 3 node cassandra 1.0.2 cluster on 3 Amazon EC2 = t1.micro.

 

I = managed to fix some OOM I had, but I still have some spike of cpu = load.

 

I = know that t1.micro have small resources, but I think it could be enough = if they were well managed.

 

My application works well, excepted when cassandra = need to run a compaction on a node. To do it, Cassandra uses 100% of the = cpu, generating a lot of time out. My time out is configured to 250 ms = with 2 attempt max. I'm running in production, our actual system use = MySQL and we are trying to replace MySQLwith Cassandra. Cassandra musn't = slow down the production environnement while we use both DB in parallel, = that is why I can't increase the time before a time = out.

 

Running this compaction in background somehow could be = a good idea, after my seach about this subject, I tried by adding = JVM_OPTS=3D"$JVM_OPTS -Dcassandra.compaction.priority=3D1" to = the cassandra-env.sh

 

This option was added for Cassandra 0.6.3, is it still = usefull ? It doesn't resolve my problem.

 

Anyways, this doesn't help while performing a nodetool = repair, the cpu load is still 100%.

 

Is there a way to turn these exceptional tasks into = backgrounds tasks, using only available cpu = ?

 

Is there a way to get Cassandra working properly on = EC2 t1.micros ?

 

Thanks,

 

Alain

No virus = found in this incoming message.
Checked by AVG - = www.avg.com
Version: 9.0.920 / Virus Database: 271.1.1/4017 - Release = Date: 11/14/11 14:34:00

------=_NextPart_000_003B_01CCA38B.D9790370--