Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 61963F0E6 for ; Sun, 24 Mar 2013 17:11:34 +0000 (UTC) Received: (qmail 67350 invoked by uid 500); 24 Mar 2013 17:11:32 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 67320 invoked by uid 500); 24 Mar 2013 17:11:31 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 67312 invoked by uid 99); 24 Mar 2013 17:11:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 24 Mar 2013 17:11:31 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a57.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 24 Mar 2013 17:11:25 +0000 Received: from homiemail-a57.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a57.g.dreamhost.com (Postfix) with ESMTP id ACA2C208060 for ; Sun, 24 Mar 2013 10:10:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=Q6Kdw4wwTKOXuMSl4Gk2nKmb5g c=; b=nazzk1FWt1QvNULH2kx4LdF25jmvvU6uwTetZL4rA2fiAKKxM6pBxB+qs+ XAoDN4rps0fKnvx1WGxcOnoGlF3qQX8QAchsrFooWE85XEONK3vL7ygV2ziGTQhO EsGb2NuKO67wJNrPk3qD7kbhc2cDry5YSuAWUV9bL1lZWuaJE= Received: from [172.16.1.8] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a57.g.dreamhost.com (Postfix) with ESMTPSA id 28311208058 for ; Sun, 24 Mar 2013 10:10:59 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_361828D8-5BDB-48AB-9988-6BC972D872F1" Message-Id: <49883A9A-EB50-44C4-B867-754BE5258453@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Stream fails during repair, two nodes out-of-memory Date: Mon, 25 Mar 2013 06:11:02 +1300 References: <1364000318.42306.GenericBBA@web160902.mail.bf1.yahoo.com> To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_361828D8-5BDB-48AB-9988-6BC972D872F1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > compaction needs some disk I/O. Slowing down our compaction will = improve overall system performance. Of course, you don't want to go too = slow and fall behind too much. In this case I was thinking of the memory use.=20 Compaction tasks are a bit like a storm of reads. If you are having = problems with memory management all those reads can result in increased = GC.=20 > It looks like we hit OOM when repair starts streaming > multiple cfs simultaneously.=20 Odd. It's not very memory intensive.=20 > I'm wondering if I should throttle streaming, and/or repair only one > CF at a time. Decreasing stream_throughput_outbound_megabits_per_sec may help, if the = goal is just to get repair working.=20 You may also want to increase phi_convict_threshold to 12, this will = make it harder for a node to get marked as down. Which can be handy when = GC is causing problems and you have under powered nodes. If the node is = marked as down the repair session will fail instantly.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 24/03/2013, at 9:12 AM, Dane Miller wrote: > On Fri, Mar 22, 2013 at 5:58 PM, Wei Zhu wrote: >> compaction needs some disk I/O. Slowing down our compaction will = improve overall >> system performance. Of course, you don't want to go too slow and fall = behind too much. >=20 > Hmm. Even after making the suggested configuration changes, repair > still fails with OOM (but only one node died this time, which is an > improvement). It looks like we hit OOM when repair starts streaming > multiple cfs simultaneously. Just prior to OOM, the node loses > contact with another node in the cluster and starts storing hints. >=20 > I'm wondering if I should throttle streaming, and/or repair only one > CF at a time. >=20 >> From: "Dane Miller" >> Subject: Re: Stream fails during repair, two nodes out-of-memory >>=20 >> On Thu, Mar 21, 2013 at 10:28 AM, aaron morton = wrote: >>> heap of 1867M is kind of small. According to the discussion on this = list, >>> it's advisable to have m1.xlarge. >>>=20 >>> +1 >>>=20 >>> In cassadrea-env.sh set the MAX_HEAP_SIZE to 4GB, and the = NEW_HEAP_SIZE to >>> 400M >>>=20 >>> In the yaml file set >>>=20 >>> in_memory_compaction_limit_in_mb to 32 >>> compaction_throughput_mb_per_sec to 8 >>> concurrent_compactors to 2 >>>=20 >>> This will slow down compaction a lot. You may want to restore some = of these >>> settings once you have things stable. >>>=20 >>> You have an under powered box for what you are trying to do. >>=20 >> Thanks very much for the info. Have made the changes and am = retrying. >> I'd like to understand, why does it help to slow compaction? >>=20 >> It does seem like the cluster is under powered to handle our >> application's full write load plus repairs, but it operates fine >> otherwise. >>=20 >> On Wed, Mar 20, 2013 at 8:47 PM, Wei Zhu wrote: >>> It's clear you are out of memory. How big is your data size? >>=20 >> 120 GB per node, of which 50% is actively written/updated, and 50% is >> read-mostly. >>=20 >> Dane >>=20 --Apple-Mail=_361828D8-5BDB-48AB-9988-6BC972D872F1 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 compaction needs some disk I/O. Slowing down = our compaction will improve overall system performance. Of course, you = don't want to go too slow and fall behind too = much.
In this case I was thinking of the memory = use. 
Compaction tasks are a bit like a storm of reads. = If you are having problems with memory management all those reads can = result in increased GC. 

It looks like we hit OOM when repair starts = streaming
multiple cfs simultaneously. 
Odd. It's = not very memory intensive. 

I'm wondering if I should throttle streaming, and/or = repair only one
CF at a time.
Decreasing = stream_throughput_outbound_megabits_per_sec may help, if the goal is = just to get repair working. 

You may also = want to increase phi_convict_threshold to 12, this will make it = harder for a node to get marked as down. Which can be handy when GC is = causing problems and you have under powered nodes. If the node is marked = as down the repair session will fail = instantly. 

Cheers

http://www.thelastpickle.com

On 24/03/2013, at 9:12 AM, Dane Miller <dane@optimalsocial.com> = wrote:

On Fri, Mar 22, 2013 at 5:58 PM, Wei Zhu <wz1975@yahoo.com> = wrote:
compaction needs some disk I/O. = Slowing down our compaction will improve overall
system performance. = Of course, you don't want to go too slow and fall behind too = much.

Hmm.  Even after making the suggested = configuration changes, repair
still fails with OOM (but only one node = died this time, which is an
improvement).  It looks like we hit = OOM when repair starts streaming
multiple cfs simultaneously. =  Just prior to OOM, the node loses
contact with another node in = the cluster and starts storing hints.

I'm wondering if I should = throttle streaming, and/or repair only one
CF at a = time.

From: "Dane Miller"
Subject: = Re: Stream fails during repair, two nodes out-of-memory

On Thu, = Mar 21, 2013 at 10:28 AM, aaron morton <aaron@thelastpickle.com> = wrote:
heap of 1867M is kind of small. = According to the discussion on this list,
it's advisable to have = m1.xlarge.

+1

In cassadrea-env.sh set the MAX_HEAP_SIZE to = 4GB, and the NEW_HEAP_SIZE to
400M

In the yaml file = set

in_memory_compaction_limit_in_mb to = 32
compaction_throughput_mb_per_sec to 8
concurrent_compactors to = 2

This will slow down compaction a lot. You may want to restore = some of these
settings once you have things stable.

You have = an under powered box for what you are trying to = do.

Thanks very much for the info.  Have made = the changes and am retrying.
I'd like to understand, why does it = help to slow compaction?

It does seem like the cluster is under = powered to handle our
application's full write load plus repairs, but = it operates fine
otherwise.

On Wed, Mar 20, 2013 at 8:47 PM, = Wei Zhu <wz1975@yahoo.com> = wrote:
It's clear you are out of memory. = How big is your data size?

120 GB per node, of which = 50% is actively written/updated, and 50% = is
read-mostly.

Dane


=
= --Apple-Mail=_361828D8-5BDB-48AB-9988-6BC972D872F1--