Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C0255200BB3 for ; Wed, 2 Nov 2016 15:36:01 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id BECBD160AEA; Wed, 2 Nov 2016 14:36:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 955D7160AFB for ; Wed, 2 Nov 2016 15:36:00 +0100 (CET) Received: (qmail 95281 invoked by uid 500); 2 Nov 2016 14:35:59 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 95035 invoked by uid 99); 2 Nov 2016 14:35:58 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2016 14:35:58 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 79F19C0EDB for ; Wed, 2 Nov 2016 14:35:58 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.997 X-Spam-Level: * X-Spam-Status: No, score=1.997 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 1W47N9xaxc9k for ; Wed, 2 Nov 2016 14:35:56 +0000 (UTC) Received: from NAM02-CY1-obe.outbound.protection.outlook.com (mail-cys01nam02on0106.outbound.protection.outlook.com [104.47.37.106]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A371B5FB04 for ; Wed, 2 Nov 2016 14:35:55 +0000 (UTC) Received: from SN1PR07CA0036.namprd07.prod.outlook.com (10.162.170.174) by CY1PR0701MB1177.namprd07.prod.outlook.com (10.160.146.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.693.12; Wed, 2 Nov 2016 14:35:48 +0000 Received: from BN1AFFO11FD022.protection.gbl (2a01:111:f400:7c10::156) by SN1PR07CA0036.outlook.office365.com (2a01:111:e400:3000::46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.693.12 via Frontend Transport; Wed, 2 Nov 2016 14:35:48 +0000 Authentication-Results: spf=pass (sender IP is 67.217.111.13) smtp.mailfrom=demandware.com; cassandra.apache.org; dkim=none (message not signed) header.d=none;cassandra.apache.org; dmarc=bestguesspass action=none header.from=demandware.com; Received-SPF: Pass (protection.outlook.com: domain of demandware.com designates 67.217.111.13 as permitted sender) receiver=protection.outlook.com; client-ip=67.217.111.13; helo=mailgw.demandware.com; Received: from mailgw.demandware.com (67.217.111.13) by BN1AFFO11FD022.mail.protection.outlook.com (10.58.52.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.693.6 via Frontend Transport; Wed, 2 Nov 2016 14:35:47 +0000 Received: from DWU16EXCH2.int.demandware.com (10.200.40.232) by DWU14EXCH1.int.demandware.com (10.200.40.231) with Microsoft SMTP Server (TLS) id 15.0.1210.3; Wed, 2 Nov 2016 10:35:47 -0400 Received: from DWU16EXCH2.int.demandware.com ([fe80::a485:4a6d:e794:19a9]) by DWU16EXCH2.int.demandware.com ([fe80::a485:4a6d:e794:19a9%22]) with mapi id 15.00.1210.000; Wed, 2 Nov 2016 10:35:46 -0400 From: Mike Torra To: "user@cassandra.apache.org" Subject: failing bootstraps with OOM Thread-Topic: failing bootstraps with OOM Thread-Index: AQHSNRZlmE3vVlpIoEauHSMhOIQDbQ== Date: Wed, 2 Nov 2016 14:35:45 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.200.40.102] Content-Type: multipart/alternative; boundary="_000_D43F71FF18452mtorrademandwarecom_" MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:67.217.111.13;IPV:CAL;SCL:-1;CTRY:US;EFV:NLI;SFV:NSPM;SFS:(10019020)(7916002)(2980300002)(438002)(53754006)(199003)(189002)(19580405001)(11100500001)(36756003)(6116002)(19580395003)(87936001)(7736002)(7846002)(7906003)(30436002)(102836003)(189998001)(107886002)(86362001)(68736007)(260700001)(84326002)(3846002)(586003)(97736004)(81156014)(81166006)(512954002)(8676002)(16236675004)(92566002)(2906002)(110136003)(54356999)(8936002)(1730700003)(4546004)(26826002)(356003)(5660300001)(229853001)(15975445007)(2501003)(5250100002)(2900100001)(69596002)(6916009)(50986999)(2351001)(3480700004)(106476002)(106466001)(106116001)(106356001)(626004)(450100001)(19617315012)(15395725005)(19627235001);DIR:OUT;SFP:1102;SCL:1;SRVR:CY1PR0701MB1177;H:mailgw.demandware.com;FPR:;SPF:Pass;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: 1;BN1AFFO11FD022;1:vko51Z4SSYVe6IguEljReZnqDyWc7+nQGVw/FzrA70KaQ0sCsOTOuForp2N52B3IqdhEHXoRIJ2sj1ZjG8xsXNI1HpJcziYZ1qVUIexKEY6PjpWWEVZHXgEVRAQiGgPNPWjJLoJTYahRySsU6tmbz+u7FiLYODlgolU3zUbAaa/MbAL4/9Pny5/e8euCPXczjNXU7YU4FvFLt1OW6lw++xFGnTFKMv2BTcuT+kuyPofNwJzIjYVkog53xSs2sOEoRQiC58+/duj3AOPLiWUJaKwaOn2sGSQ28pzxqPo8DmPdQt3+7JRs9UQq8HYfeManrPCDH9lVXwbZjyYT0G+3kWIetfwUSKA97BBfChhB7YlvKM1LGpMrrH9CnSPC6yuV6iXbUS6JWBd/nGcajiT0MBf391dYMWQimTHksxyY28DFa7beKb/0DI5dXhkMIdZ1kjpyTfINVz+XM6La6sJTJ5Q3c5gId36ZE5DB/Tw8hriY8kbve4V8tyYeaKX/RlAfYpt++Rjw8vtLlNCDRt45gQ== X-MS-Office365-Filtering-Correlation-Id: 33615e9b-5a46-41a8-d37e-08d4032d8956 X-Microsoft-Exchange-Diagnostics: 1;CY1PR0701MB1177;2:U17QY33Sf87twmsc90zf9Xh2GHkxIu7H/eSJ2OECLwJnxfy1WRONmWRiPeZ+gOtGXARKoFr3CmWLxootJAsnrZZM9OuXBX+DWQimiOKiUG+6/T63ri+ErbGwxbltiFHGzsdsCmm1QwPBjtTgYZcKOBwO/xY4ODhnhpCYukC5AgbNCl1XCNAj0UMQ3hAcISokjGZlStN5C4x9LrSh9hRP0Q==;3:G1zGJoZX2Hk9dVDXM7Ls6vqumgmZsJk+58ApQGrcRE4gkKzc4GhjVrDl/mxWfCs1yANzquplA9i5yKIFR+LX3NWhf8n5JyWzz75R4Kn8FgIo8m8RkyMjvgZqp+Wl6/ksOoAQP6BHlcn6Xwc8+tmW4NJofIztJn2D5naPFTvuU7c/S+dYiq7W2bW46jDItoNAqrQuAtr3Hvkc6PmAGNCXCyTHkfok4F9y8TdBrYGJB1T71gGwY1wbtGiBb15KrV6v7ONQ91NaN7bnFzRwfuusBonjC4j+tDCpxHihoozCqV8= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(8251501002);SRVR:CY1PR0701MB1177; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;CY1PR0701MB1177;25:UuKt76HW8ilLM8FxvmAWjJIcQruZNpqsIOmPhgv?= =?us-ascii?Q?F6P1UF0Mc/WEzb73gN72DrznRK++mX17tnFvgNiLwT4+YCA2hOkZlfqFjecC?= =?us-ascii?Q?/k+47QTwiQkDw4ze5xfGaOxqsEBlWB0Qa0661v2o6f0LLtY8GwAzb24bgnLB?= =?us-ascii?Q?cLHs0QkvIlTwc8vYaK3DBcw12GTcBC2h8qAzvh2HJ6hAPnfvr9Crn4yfxxFJ?= =?us-ascii?Q?Tv8FNbNEWu4gCGCsthNwPkscby92OA3jetGqRUcvJUFgIbUfrGHsakFOw051?= =?us-ascii?Q?m5jVVmLgOZf2qPiHQTTMiWHO47bz6OO4TNx3OEJIZNBLbzITv3rFcLe0I++o?= =?us-ascii?Q?AqLTq04uWsB9ww0t9dIa6LzFxvrnLp9yJ1I4mCV5zcgH4NI00e4aUNEZQuMb?= =?us-ascii?Q?9MGG59pO8EbzikkL/CxQ1mSjXxEsrICQ+OlJT5Ynu4g1wgwim4H+65uKa0LF?= =?us-ascii?Q?fB7am2u610aZu06QdeJQYwKHAyYS2IHKXLOcOrJ2XqhwEcUomobRc5nrp2KU?= =?us-ascii?Q?ElAPtVkj9mRSMyGu2hrRgTLM2Z/9/3dXQzDDjWzhGBL3UBialXNYqNBTyQwD?= =?us-ascii?Q?ybsfD+Vaz8oOG6QOju5dRFLvIqmRUJ82VrGHEwZA8nkrt8xGFw9SVbONqMgG?= =?us-ascii?Q?pekznCrzmi9U1IpzeYMgY2zT0vcEMSmhI7O+KP7NW4qM++DaAjWDY7Y3wfO+?= =?us-ascii?Q?M57kjKeGRsl2KsWyMx/cg8REBHsLcfOzzIPkz5jMNsFjpdyaRjMxjLdbM/29?= =?us-ascii?Q?vSZ/qBYKw9PemFas/Kc804Jb1T9niGUztd1LtI3B7ou3txTQhIN33pmuymNR?= =?us-ascii?Q?WBifQ5JpLEy0HXX1pR3b/p+a4k1TgAJ4e3GoTrbIp3qNUlARKGvK83iPgriQ?= =?us-ascii?Q?chb0OVjwRJXVBf5F5Yp9Pe6FhybDAfyJrfxn4C82Q7OzM4hequO5nve/3fm7?= =?us-ascii?Q?myDITBeQHgG/qm9ID6HqhSDf4liUHyz7W2DvGiyQtnRZrbDyWiJ9hEwVtga7?= =?us-ascii?Q?p/V4=3D?= X-Microsoft-Exchange-Diagnostics: 1;CY1PR0701MB1177;31:sZ1smVKxDRbIv5BSSNMfOwwkMkXvOsNEp9Pa6PuGdnvuVrkcnwBZ72agyFDx7/YtTfT/YDMeYBA7pMnwwLFcEgzbAds9S7SRutaThm8SMrB+RtUjjGcH15/lsD18MMBiXDGhFWl/nA7JWGNeWs8K9Zg1ftaIvszbQSXvWA4Fobl+wcndg1+Hb/UhqWMV3bCwEkRHrhbzXMGewsjQqf70THLYHumCBkZFqCIzLw3g0+COAM3u57TY8PQwM33OYhnp;20:hJiExkoKvos59Bi63WmY9LfboWmrVEhzgX/k4NdbnryxHqMQ4uFCF6ILE68AB9xq6KnOQI0CgKmlH1tw+c8BJdGwBrBu67XoeU9+cqibgVAyHcpdNYe1RAA3TVNcbDKJ4Qz0cXmlMP8vCQMnEmqvHgcc9v6Ptwt0SKViO+1tH/AfH6btAnFyYncN/5+laqdsJpCPy3DAsuC8lWSIbrfacBem1qCf03UZTxknbRMN0OXtIMAcapjNP8s/5jPm2wT2+4s5p5BMtFiy/3RTxMZFefdn9l+E3hWEmbJ0tPF0uO7UV+pR48MEyXbh6drQP6uBgjgShQRUQNgU3b7gn0/dJRErPcOLR1wF4j2T9mtMlpvgy99bvt0H3mZEa6TazJltn8CM5mHtdXVv/jXAHTwxxzMqsnkcGtCS/yjgfaIDvK88KT1Xi/dKwc35x4eZFoRk+tgnos44TUSybcTk9vE1GtvlOoIus2D8YvCCffgTU3olFxSTG9PRa1Z8hjtoh8If X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(185212123834332)(42293939777685); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(13024025)(13013025)(13023025)(13020025)(8121501046)(5005006)(3002001)(10201501046);SRVR:CY1PR0701MB1177;BCL:0;PCL:0;RULEID:;SRVR:CY1PR0701MB1177; X-Microsoft-Exchange-Diagnostics: 1;CY1PR0701MB1177;4:I+5BOyUGeygRvjwDoA+/pQqpATkdLSQ+UDqWwVPjSurB2E1KBI2Y2lVpaPaotjIujwltsdTUnVLoqe5Jby5tJeHfVBoFCDnIE23A4/oRR47VWctNIng2thHZd5NpNuR0YwO9/ELLW1ThBL+NvGdp8vKtLCC4d8bCr0HGgvISafur9RuN6A3/cYsyg7PG6ZmNJSxBElDF7Pb/v/zRw3PYMkjFKZeJd6XdbQ3g5CoRDpBQT+7ZuhgbYLXLIpXKQi75cm+00puJ/sPZAq/cSAFIjthjXx6m5+FhlbDQOCGvEflOO5ZL+ee9xZj6Hj7Wo/y+eKb15eMGdk/ujP7g81hlzB5k82sGYPbRdoNgXiIY3uaBIfbg03MBklgKRoAsc39xhfd4KqIMz9BWKy6xqmeHIFRY0Zvi8vpRTvdhC9QEkh9BF4u+xc0/o14brSiY9qPrEEq2c4sd1JD2hoCuygaBHjskEBhTcKDJADbzMo2uw6hcrSa6IxkLkjRVxsu3b9uTupvfTqVshL86MJqqDQP/6FHt/MhthKZJ4mG5FMamalRsxCoA2HSZDwSU2LaWsmht X-Forefront-PRVS: 0114FF88F6 X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;CY1PR0701MB1177;23:rCJaOuevoiVpv8obXEBTxVMsua7TpX6nLwJCWBL?= =?us-ascii?Q?yyOCcUeM+3XW28lUJ6quwa/XR+uk3lWYBRj0V1BHRFaXvQ2v4zc+ljyKD/+c?= =?us-ascii?Q?7nxzj6xM1Wk7Kqn4/uxgggIB1661jyc4s1e9qh3AMmA8MzvEnsZKtxApUhoN?= =?us-ascii?Q?w4rkcU6c2F+B35W9krSU0qdr5EPn3lG6tIpMGmRVHpFuHV5u9m2+o549kvqf?= =?us-ascii?Q?Gl/n9jhd/XJOtPqbfenzva3FyZ8uveDuhC2WqxRdY/gDBdvLTilG9cNrHKXx?= =?us-ascii?Q?sUkH91coePPeuBLK28bP+2Ji6CV90ZsxPnUreVu1lb96vAYHwZKlXiXR+C/S?= =?us-ascii?Q?3XB5vL4ceYGfWoNF0IMqDwlktU2FfcgHOJxmA0bp2YnDFYH8qhl+Ql12+mYc?= =?us-ascii?Q?wVcp/e7fvISKxpTj8O2A8FZtfZQ6z5Xw4qvHbak2h4DsPAEyDdUuU+ojMABu?= =?us-ascii?Q?MaMsqta9igbi6odE0jw8Tj2vxnQTS9M3haGYWdzd6Jz/R3n/woKO4nehDSlG?= =?us-ascii?Q?sOfoaBgxswHK4kGScL+BBmR16a7Zj/kHLufGTyiVewM9mp3Qaas5KKGk/DcB?= =?us-ascii?Q?enZ/W4SD1g5ULIuCWd3UMN8saZbtGYwsGvyHHlxPMaocj9U/DdywEM9wOfTO?= =?us-ascii?Q?JH9cxpfim7FapdSn7H6t1XibDzW+7kn6s67nvytrlIx17yu1qpKcQBGwzpKW?= =?us-ascii?Q?07pHcHHnEmjTPwrXZtm1M6ydKAeNy5ZRY8bOBoYPxUFucXSJgW3ctpRQnmZ1?= =?us-ascii?Q?l34HVXMnj28NVhQZNgcoomMoWd2P9dOE+O5vIh8CnwEVrX5TwyZagy6Iwhw9?= =?us-ascii?Q?rM67Kvr+XXWavx2pZmvAHyOMzGsb9FoPUVtZzp27TnfzoobpGUkqRWTQds12?= =?us-ascii?Q?ccCQ6cp/V/uGUWuh01EcSGLKax0kMSMZBJAyoaBZQKJnlTElcNRGAOuAD2ET?= =?us-ascii?Q?ptRosd/oPy49fpS/u7uYHOTPckf/9WRiCXjbpYuAwgcn4tD/B5/uUcdDT2c4?= =?us-ascii?Q?YK9ojVKHOLt5hmM16njd2c8rlwE2YCHCXWEMtnBWl6Y/XDfp+gDzy/5r6kUg?= =?us-ascii?Q?wJ+QPywxShkUWydSNNmsu9gx4yi9uzrt5uavv+sBNqBB6bjR6GZ3waxmbg2M?= =?us-ascii?Q?sP6fAx/6saXGOI4X7fL5XhnZUf1VUK9TMKbb5NC2HppdXKn7P8crMcxyunwM?= =?us-ascii?Q?DfmzC+aC83s+o20WrWjuJUMoClGHqwUhytOVd0r/StrgR8hyCxsM+vs7ljAk?= =?us-ascii?Q?B0cMFxV4Qgc3yfWbz5YtMHWBHuuQQIAIwy4zL0id8KTyvuAkWllZBJ9sSFEs?= =?us-ascii?Q?YnvrrH0j6wtDdmd4KjLrU/k6gmf8lpyfP9HfQ5gnvpcVuZB0a9mGoxbFssuQ?= =?us-ascii?Q?TvnDcarjXR4AiTFiqwAgR4YRFfus51SPLN4jutdafXuCDm7OUSu3ZWICTcav?= =?us-ascii?Q?w22keVOl255rXngNeJcgn8lg1hOiZAt9BVr/z0G/khtYbUP6K2YC6?= X-Microsoft-Exchange-Diagnostics: 1;CY1PR0701MB1177;6:Sxnbi7WCVtM/qx+Rxc0dvyzzy6Q2Ueab+GSBidW7PeLiEGv6ig2qvOy4j8UCfONsxR1ea+XcoAOukzSRNMVwqtXztkEezBsNEludVBzYNJ/rodT+3S+U2vGzTbTdqqnCNbsU0WQOFMM01+kVZk5iMx2ebNb48mF1GT1Jz4gR7xeip84TxwpoLGFlr7MWOlLY2z9gmivCHgmDEbOMqu2QG6biIv0PU2j+1qUTbQXZRusinRHkE/2ae9MqCMQwBGV3hsKV9TtoXvvZLTeJsuAI+CZmPCGxRTVziK6YdOTuv1QC4wbzWT7BakYyq1AtDvzQ;5:D92GlHy/aWW33IZdn5q2SI0dGBbIx6Pp4irDdrcM+BAIcrRNmg4NnGDmFiFuh56raxMdi7sJ1807lllNlqx5uNdqxINuULCI3mmnaMnbvaXMQzO680DfLRTzVrFOQCQdU3zOE+rL/lXpfLaPccK1tg==;24:tMyfqdjvBtW0eM3oavqgHgQCndakjc2R3l2IslkmHtP4355RpEupPxxEDnFvQkzg/opjbvIu6/94kjbMBXBycTGW9u3kPbNzpPgZHQ3hzWo= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;CY1PR0701MB1177;7:uXMEPPOKvNGAAEg3RFQzjX2wnOv1e4WlehXR5426Cxcbx9dhKmxPs3QPMSsTkMOt7ea7OJx2frwAgn8Cnx7p3A050zJuxL+Aj29NIXQu5dPpMN56gDiL8GpoP0VsCMUrsK/jnFotz0wvfc21asQ9703nnfiBiNcNRKHgkT2a6mkBUQHGrZD0qE3pdQUfjXALuS/CxrWjcMPc9v5zdxReYWEjtJKbAe70RKelkG9X67DpgbePchk4fr8ZSMSDMYgo1jswiIYTekLGzO6UF9qDr6c7SLrxzEDiz/Ynv50wGZgAQvPJEyUMZA0TD7J6eDGkY2qUOd1oWUoonlBGpk1g0da2H67yLulpXtup59Li3Sc= X-OriginatorOrg: demandware.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Nov 2016 14:35:47.7727 (UTC) X-MS-Exchange-CrossTenant-Id: 331fd125-ac38-4ca8-8235-7eb697bfb1f5 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=331fd125-ac38-4ca8-8235-7eb697bfb1f5;Ip=[67.217.111.13];Helo=[mailgw.demandware.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0701MB1177 archived-at: Wed, 02 Nov 2016 14:36:01 -0000 --_000_D43F71FF18452mtorrademandwarecom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi All - I am trying to bootstrap a replacement node in a cluster, but it consistent= ly fails to bootstrap because of OOM exceptions. For almost a week I've bee= n going through cycles of bootstrapping, finding errors, then restarting / = resuming bootstrap, and I am struggling to move forward. Sometimes the boot= strapping node itself fails, which usually manifests first as very high GC = times (sometimes 30s+!), then nodetool commands start to fail with timeouts= , then the node will crash with an OOM exception. Other times, a node strea= ming data to this bootstrapping node will have a similar failure. In either= case, when it happens I need to restart the crashed node, then resume the = bootstrap. On top of these issues, when I do need to restart a node it takes a loooong= time (http://stackoverflow.com/questions/40141739/why-does-cassandra-somet= imes-take-a-hours-to-start). This exasperates the problem because it takes = so long to find out if a change to the cluster helps or if it still fails. = I am in the process of upgrading all nodes in the cluster from m4.xlarge to= c4.4xlarge, and I am running Cassandra DDC 3.5 on all nodes. The cluster h= as 26 nodes spread across 4 regions in EC2. Here is some other relevant clu= ster info (also in stack overflow post): Cluster Info * Cassandra DDC 3.5 * EC2MultiRegionSnitch * m4.xlarge, moving to c4.4xlarge Schema Info * 3 CF's, all 'write once' (ie no updates), 1 week ttl, STCS (default) * no secondary indexes I am unsure what to try next. The node that is currently having this bootst= rap problem is a pretty beefy box, with 16 cores, 30G of ram, and a 3.2T EB= S volume. The slow startup time might be because of the issues with a high = number of SSTables that Jeff Jirsa mentioned in a comment on the SO post, b= ut I am at a loss for the OOM issues. I've tried: * Changing from CMS to G1 GC, which seemed to have helped a bit * Upgrading from 3.5 to 3.9, which did not seem to help * Upgrading instance types from m4.xlarge to c4.4xlarge, which seems to= help, but I'm still having issues I'd appreciate any suggestions on what else I can try to track down the cau= se of these OOM exceptions. - Mike --_000_D43F71FF18452mtorrademandwarecom_ Content-Type: text/html; charset="us-ascii" Content-ID: <0B25CE91DF967346AD8D059AF95E9D14@demandware.com> Content-Transfer-Encoding: quoted-printable
Hi All -

I am trying to bootstrap a replacement node in a cluster, but it consi= stently fails to bootstrap because of OOM exceptions. For almost a week I'v= e been going through cycles of bootstrapping, finding errors, then restarti= ng / resuming bootstrap, and I am struggling to move forward. Sometimes the bootstrapping node itself fails,= which usually manifests first as very high GC times (sometimes 30s+!),= then nodetool commands start to fail with timeouts, then the node will cra= sh with an OOM exception. Other times, a node streaming data to this bootstrapping node will have a similar failu= re. In either case, when it happens I need to restart the crashed node, the= n resume the bootstrap.

On top of these issues, when I do need to restart a node it takes a lo= ooong time (http://stackoverflow.com/questi= ons/40141739/why-does-cassandra-sometimes-take-a-hours-to-start). This exasperates the problem because it takes so long to find out if a cha= nge to the cluster helps or if it still fails. I am in the process of upgra= ding all nodes in the cluster from m4.xlarge to c4.4xlarge, and I am runnin= g Cassandra DDC 3.5 on all nodes. The cluster has 26 nodes spread across 4 regions in EC2. Here is some othe= r relevant cluster info (also in stack overflow post):

Cluster Info

  • Cassandra DDC 3.5
  • EC2MultiRegionSnitch
  • m4.xlarge, moving to c4.4xlarge

Schema Info

  • 3 CF's, all 'write once' (ie no updates), 1 week ttl, STCS (default)
  • = no secondary indexes
I am unsure what to try next. The node that is currently having this b= ootstrap problem is a pretty beefy box, with 16 cores, 30G of ram, and a 3.= 2T EBS volume. The slow startup time might be because of the issues with a = high number of SSTables that Jeff Jirsa mentioned in a comment on the SO post, but I am at a loss for the OO= M issues. I've tried:
  • Changing from CMS to G1 GC, which seemed to have helped a bit
  • U= pgrading from 3.5 to 3.9, which did not seem to help
  • Upgrading inst= ance types from m4.xlarge to c4.4xlarge, which seems to help, but I'm still= having issues
I'd appreciate any suggestions on what else I can try to track down th= e cause of these OOM exceptions.

- Mike
--_000_D43F71FF18452mtorrademandwarecom_--