Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AEA7317481 for ; Wed, 7 Oct 2015 10:13:27 +0000 (UTC) Received: (qmail 83006 invoked by uid 500); 7 Oct 2015 10:13:27 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 82890 invoked by uid 500); 7 Oct 2015 10:13:27 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 82712 invoked by uid 99); 7 Oct 2015 10:13:27 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Oct 2015 10:13:27 +0000 Date: Wed, 7 Oct 2015 10:13:27 +0000 (UTC) From: "Robbie Strickland (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946624#comment-14946624 ] Robbie Strickland commented on CASSANDRA-10449: ----------------------------------------------- I increased max heap to 96GB and tried again. Now doing netstats shows progress ground to a halt: 9pm: {noformat} ubuntu@eventcass4x024:~$ nodetool netstats | grep -v 100% Mode: JOINING Bootstrap 45d8dec0-6c12-11e5-90ef-f7a8e02e59c0 /52.1.155.147 (using /10.239.209.15) Receiving 139 files, 36548040412 bytes total. Already received 139 files, 36548040412 bytes total /52.2.9.34 (using /10.239.209.17) Receiving 171 files, 60000431853 bytes total. Already received 171 files, 60000431853 bytes total /52.0.152.88 (using /10.239.209.44) Receiving 147 files, 78458709168 bytes total. Already received 79 files, 55003961646 bytes total /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-295-Data.db 955162267/4105438496 bytes(23%) received from idx:0/52.0.152.88 /52.2.0.164 (using /10.239.209.16) Receiving 141 files, 36700837768 bytes total. Already received 141 files, 36700837768 bytes total /54.152.177.161 (using /10.239.209.93) /54.172.174.48 (using /10.239.209.49) Receiving 176 files, 79676288976 bytes total. Already received 98 files, 55932809644 bytes total /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-329-Data.db 174070078/7326235809 bytes(2%) received from idx:0/54.172.174.48 /52.2.75.82 (using /10.239.208.88) /54.165.111.69 (using /10.239.209.47) Receiving 170 files, 85920995638 bytes total. Already received 94 files, 54985226700 bytes total /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-265-Data.db 4875660361/22821083384 bytes(21%) received from idx:0/54.165.111.69 /52.6.136.30 (using /10.239.209.45) Receiving 174 files, 87064163973 bytes total. Already received 91 files, 53930233899 bytes total /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-157-Data.db 17064156850/25823860172 bytes(66%) received from idx:0/52.6.136.30 /52.7.14.201 (using /10.239.209.46) Receiving 164 files, 46351636573 bytes total. Already received 164 files, 46351636573 bytes total /52.2.30.66 (using /10.239.209.18) Receiving 158 files, 62899520151 bytes total. Already received 158 files, 62899520151 bytes total /54.175.138.33 (using /10.239.209.96) /54.88.44.178 (using /10.239.209.91) /52.2.109.194 (using /10.239.208.89) /54.172.81.117 (using /10.239.209.95) /54.172.103.46 (using /10.239.209.48) Receiving 164 files, 48771232182 bytes total. Already received 164 files, 48771232182 bytes total /54.164.172.164 (using /10.239.209.94) Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool Name Active Pending Completed Commands n/a 19 56 Responses n/a 0 35515795 {noformat} 6am: {noformat} ubuntu@eventcass4x024:~$ nodetool netstats | grep -v 100% Mode: JOINING Bootstrap 45d8dec0-6c12-11e5-90ef-f7a8e02e59c0 /52.1.155.147 (using /10.239.209.15) Receiving 139 files, 36548040412 bytes total. Already received 139 files, 36548040412 bytes total /52.2.9.34 (using /10.239.209.17) Receiving 171 files, 60000431853 bytes total. Already received 171 files, 60000431853 bytes total /52.0.152.88 (using /10.239.209.44) Receiving 147 files, 78458709168 bytes total. Already received 79 files, 55003961646 bytes total /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-295-Data.db 955162267/4105438496 bytes(23%) received from idx:0/52.0.152.88 /52.2.0.164 (using /10.239.209.16) Receiving 141 files, 36700837768 bytes total. Already received 141 files, 36700837768 bytes total /54.152.177.161 (using /10.239.209.93) /54.172.174.48 (using /10.239.209.49) Receiving 176 files, 79676288976 bytes total. Already received 98 files, 55932809644 bytes total /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-329-Data.db 174070078/7326235809 bytes(2%) received from idx:0/54.172.174.48 /52.2.75.82 (using /10.239.208.88) /54.165.111.69 (using /10.239.209.47) Receiving 170 files, 85920995638 bytes total. Already received 94 files, 54985226700 bytes total /var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-265-Data.db 4875660361/22821083384 bytes(21%) received from idx:0/54.165.111.69 /52.6.136.30 (using /10.239.209.45) Receiving 174 files, 87064163973 bytes total. Already received 91 files, 53930233899 bytes total /var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-157-Data.db 17064156850/25823860172 bytes(66%) received from idx:0/52.6.136.30 /52.7.14.201 (using /10.239.209.46) Receiving 164 files, 46351636573 bytes total. Already received 164 files, 46351636573 bytes total /52.2.30.66 (using /10.239.209.18) Receiving 158 files, 62899520151 bytes total. Already received 158 files, 62899520151 bytes total /54.175.138.33 (using /10.239.209.96) /54.88.44.178 (using /10.239.209.91) /52.2.109.194 (using /10.239.208.89) /54.172.81.117 (using /10.239.209.95) /54.172.103.46 (using /10.239.209.48) Receiving 164 files, 48771232182 bytes total. Already received 164 files, 48771232182 bytes total /54.164.172.164 (using /10.239.209.94) Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool Name Active Pending Completed Commands n/a 19 56 Responses n/a 0 51933813 {noformat} No additional long GC pauses. > OOM on bootstrap due to long GC pause > ------------------------------------- > > Key: CASSANDRA-10449 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10449 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu 14.04, AWS > Reporter: Robbie Strickland > Labels: gc > Fix For: 2.1.x > > Attachments: system.log.10-05 > > > I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 500-700GB per node. SSTable counts are <10 per table. I am attempting to provision additional nodes, but bootstrapping OOMs every time after about 10 hours with a sudden long GC pause: > {noformat} > INFO [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old Generation GC in 1586126ms. G1 Old Gen: 49213756976 -> 49072277176; > ... > ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 CassandraDaemon.java:223 - Exception in thread Thread[MemtableFlushWriter:454,5,main] > java.lang.OutOfMemoryError: Java heap space > {noformat} > I have tried increasing max heap to 48G just to get through the bootstrap, to no avail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)