Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 923F51878D for ; Sun, 12 Jul 2015 15:10:05 +0000 (UTC) Received: (qmail 87388 invoked by uid 500); 12 Jul 2015 15:10:05 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 87354 invoked by uid 500); 12 Jul 2015 15:10:05 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 87337 invoked by uid 99); 12 Jul 2015 15:10:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Jul 2015 15:10:05 +0000 Date: Sun, 12 Jul 2015 15:10:05 +0000 (UTC) From: "Aleksey Yeschenko (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-9640) Nodetool repair of very wide, large rows causes GC pressure and destabilization MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-9640: ----------------------------------------- Fix Version/s: (was: 2.1.x) > Nodetool repair of very wide, large rows causes GC pressure and destabilization > ------------------------------------------------------------------------------- > > Key: CASSANDRA-9640 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9640 > Project: Cassandra > Issue Type: Bug > Environment: AWS, ~8GB heap > Reporter: Constance Eustace > Priority: Minor > Attachments: syslog.zip > > > We've noticed our nodes becoming unstable with large, unrecoverable Old Gen GCs until OOM. > This appears to be around the time of repair, and the specific cause seems to be one of our report computation tables that involves possible very wide rows with 10GB of data in it. THis is an RF 3 table in a four-node cluster. > We truncate this occasionally, and we also had disabled this computation report for a bit and noticed better node stabiliy. > I wish I had more specifics. We are switching to an RF 1 table and do more proactive truncation of the table. > When things calm down, we will attempt to replicate the issue and watch GC and other logs. > Any suggestion for things to look for/enable tracing on would be welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)