Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CC44A200B27 for ; Wed, 22 Jun 2016 14:50:30 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id CAE64160A35; Wed, 22 Jun 2016 12:50:30 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ECCCB160A2E for ; Wed, 22 Jun 2016 14:50:29 +0200 (CEST) Received: (qmail 22968 invoked by uid 500); 22 Jun 2016 12:50:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 22957 invoked by uid 99); 22 Jun 2016 12:50:28 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jun 2016 12:50:28 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A08C61804C2 for ; Wed, 22 Jun 2016 12:50:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.198 X-Spam-Level: * X-Spam-Status: No, score=1.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 8ISFK0_0d6cH for ; Wed, 22 Jun 2016 12:50:25 +0000 (UTC) Received: from mail-vk0-f44.google.com (mail-vk0-f44.google.com [209.85.213.44]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id C6E135FAC4 for ; Wed, 22 Jun 2016 12:50:24 +0000 (UTC) Received: by mail-vk0-f44.google.com with SMTP id d185so60335144vkg.0 for ; Wed, 22 Jun 2016 05:50:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=zmX6Yzb4ch2A/UwAN1d6JHvGa/yl2QPMSHFH3+qwBgI=; b=thGgWEMgp+GVvdnikQ+gDDRn9274uC+98Nz17URTACFsQ39fC5cE5Iu2wmxQtMCGCh E/R+q5Rimo7suwOVIX52f5rd3iGh9ECx8+Lqr6vNt09rdGPDHarOpQdhULkU95tVRlMA /LdUOmEvO65DY20GNp7FtOv46ziDL3IlaCEMuKsecqR8V18yevhN9FLGXbubeA8tzuOm y6l3nPvP+SmtbfeeRqIN03Im+ffO4NZ+ZUil6nAXf0RhlhXTuiixwSjBLaQGJNOJI/HH eCd2HWSc/rscHBwdEC8tRNMAR0AQjLoWX9xVQ6dFblmI4upFYgi+dIkPIRxEGppyMBCz /Kbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=zmX6Yzb4ch2A/UwAN1d6JHvGa/yl2QPMSHFH3+qwBgI=; b=gl59VTfcF2YzDxLPU3JrOe1Ij4fly4w+NZYY6NF/Ddj/fyO4dtkeAMnUpbUx1/T25N R7rB17Oh6QC5dIL20MsI3JFYKfEYLE1RvSdRFzTF2EJx4FUDE+6HYmsRpkNHHRBBcEKS N5pCBKVODfqRj2tZSoT2eryQWECyOJJZ2rEbF//j8Z4GNialFZNuA2RHk99GgD30+TNH 8uXgRCsoVG018MQUi3LFz3iT2rKR3qThtqsUT2RTvOURJh0oY+suizQjOW7w6AUrYEbb iA1wBym7j9eoxa4rOkE/Z3zuO3Xmz2XWx7ot7oQA/wq2UtBkJuOVIZ6NJN+Eu1cobDRS x+hg== X-Gm-Message-State: ALyK8tL7P7Z7n+GlRe4gZUKMav9RyyQik5ElCuz3k4KhzGwKgtSk3X1EDoEAIn8bAwYY0SYGZzgbQ7gp3NWrxw== X-Received: by 10.31.92.77 with SMTP id q74mr3157990vkb.92.1466599823560; Wed, 22 Jun 2016 05:50:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.66.66 with HTTP; Wed, 22 Jun 2016 05:50:21 -0700 (PDT) In-Reply-To: References: From: Bhuvan Rawal Date: Wed, 22 Jun 2016 18:20:21 +0530 Message-ID: Subject: Re: High Heap Memory usage during nodetool repair in Cassandra 3.0.3 To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a114e5be282d4580535dd61dc archived-at: Wed, 22 Jun 2016 12:50:31 -0000 --001a114e5be282d4580535dd61dc Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks for the info Paulo, Robert. I tried further testing with other parameters and it was prevalent. We could be either 11739, 11206. But im spektical about 11739 because repair works well in 3.5 and 11739 seems to be fixed for 3.7/3.0.7. We may possibly resolve this by increasing heap size thereby reducing some page cache bandwidth before upgrading to higher versions. On Mon, Jun 20, 2016 at 10:00 PM, Paulo Motta wrote: > You could also be hitting CASSANDRA-11739, which was fixed on 3.0.7 and > could potentially cause OOMs for long-running repairs. > > > 2016-06-20 13:26 GMT-03:00 Robert Stupp : > >> One possibility might be CASSANDRA-11206 (Support large partitions on th= e >> 3.0 sstable format), which reduces heap usage for other operations (like >> repair, compactions) as well. >> You can verify that by setting column_index_cache_size_in_kb in c.yaml t= o >> a really high value like 10000000 - if you see the same behaviour in 3.7 >> with that setting, there=E2=80=99s not much you can do except upgrading = to 3.7 as >> that change went into 3.6 and not into 3.0.x. >> >> =E2=80=94 >> Robert Stupp >> @snazy >> >> On 20 Jun 2016, at 18:13, Bhuvan Rawal wrote: >> >> Hi All, >> >> We are running Cassandra 3.0.3 on Production with Max Heap Size of 8GB. >> There has been a consistent issue with nodetool repair for a while and >> we have tried issuing it with multiple options --pr, --local as well, >> sometimes node went down with Out of Memory error and at times nodes did >> stopped connecting any connection, even jmx nodetool commands. >> >> On trying with same data on 3.7 Repair Ran successfully without >> encountering any of the above mentioned issues. I then tried increasing >> heap to 16GB on 3.0.3 and repair ran successfully. >> >> I then analyzed memory usage during nodetool repair for 3.0.3(16GB heap) >> vs 3.7 (8GB Heap) and 3.0.3 occupied 11-14 GB at all times, whereas 3.7 >> spiked between 1-4.5 GB while repair runs. As they ran on same dataset >> and unrepaired data with full repair. >> >> We would like to know if it is a known bug that was fixed post 3.0.3 and >> there could be a possible way by which we can run repair on 3.0.3 withou= t >> increasing heap size as for all other activities 8GB works for us. >> >> PFA the visualvm snapshots. >> >> >> =E2=80=8B3.0.3 VisualVM Snapshot, consistent heap usage of greater than = 12 GB. >> >> >> >> =E2=80=8B3.7 VisualVM Snapshot, 8GB Max Heap and max heap usage till abo= ut 5GB. >> >> Thanks & Regards, >> Bhuvan Rawal >> >> >> PS: In case if the snapshots are not visible, they can be viewed from th= e >> following links: >> 3.0.3: >> https://s31.postimg.org/4e7ifsjaz/Screenshot_from_2016_06_20_21_06_09.pn= g >> 3.7: >> https://s31.postimg.org/xak32s9m3/Screenshot_from_2016_06_20_21_05_57.pn= g >> >> >> > --001a114e5be282d4580535dd61dc Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks for the info Paulo, Robert. I tried further testing= with other parameters and it was prevalent. We could be either 11739, 1120= 6. But im spektical about 11739 because repair works well in 3.5 and 11739 = seems to be fixed for 3.7/3.0.7.=C2=A0
We may possibly resolve this by increasing heap size thereby r= educing some page cache bandwidth before upgrading to higher versions.

On Mon, Ju= n 20, 2016 at 10:00 PM, Paulo Motta <pauloricardomg@gmail.com&g= t; wrote:
You cou= ld also be hitting CASSANDRA-11739, which was fixed on 3.0.7 and could pote= ntially cause OOMs for long-running repairs.


2016-06-20 13:26 GMT-= 03:00 Robert Stupp <snazy@snazy.de>:
One possibility might be CASSAN= DRA-11206 (Support large partitions on the 3.0 sstable format), which reduc= es heap usage for other operations (like repair, compactions) as well.
= You can verify that by setting=C2=A0column_index_cache_size_in_kb in c.yaml= to a really high value like=C2=A010000000 - if you see the same behaviour = in 3.7 with that setting, there=E2=80=99s not much you can do except upgrad= ing to 3.7 as that change went into 3.6 and not into 3.0.x.

=E2=80=94
Robert Stupp
@snazy
<= /div>

On 20 Jun 2016, at 18:13, Bhu= van Rawal <bhu1= rawal@gmail.com> wrote:

= Hi All,

We are running Cassandra 3.0.3 on Production wit= h Max Heap Size of 8GB. There has been a consistent issue with nodetool repair for a while a= nd we have tried issuing it with multiple options --pr, --local as well, so= metimes node went down with Out of Memory error and at times nodes did stop= ped connecting any connection, even jmx nodetool commands.=C2=A0
=
On trying with same data on 3.7 Repair Ran successfully with= out encountering any of the above mentioned issues. I then tried increasing= heap to 16GB on 3.0.3 and repair ran successfully.

I then analyzed memory usage during nodetool repair for 3.0.3(16GB heap) vs 3.7 (8GB Heap) = and 3.0.3 occupied 11-14 = GB at all times, whereas 3.7 spiked between 1-4.5 GB while repair runs. As they ran on= same dataset and unrepaired data with full repair.=C2=A0

We would like to know if it is a known bug that was fixed post 3.0.= 3 and there could be a possible way by which we can run repair on 3.0.3 wit= hout increasing heap size as for all other activities 8GB works for us.

PFA the visualvm snapshots.

<Screenshot from 2016-06-20 21:06:09.png>=E2=80=8B3.0.3 VisualVM Snapshot, consistent heap usage of greater than 1= 2 GB.


<Screensh= ot from 2016-06-20 21:05:57.png>
=E2=80=8B3.7 VisualVM S= napshot, 8GB Max Heap and max heap usage till about 5GB.
=

Thanks & Regards,
Bhuvan Rawal
=

PS: In case if the snapshots are not visible,= they can be viewed from the following links:



--001a114e5be282d4580535dd61dc--