Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EAA71200AC8 for ; Mon, 9 May 2016 05:48:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E94D5160A07; Mon, 9 May 2016 03:48:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6E6DF160A06 for ; Mon, 9 May 2016 05:48:21 +0200 (CEST) Received: (qmail 45145 invoked by uid 500); 9 May 2016 03:48:19 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 45126 invoked by uid 99); 9 May 2016 03:48:19 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 May 2016 03:48:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 4A9C3C0B47; Mon, 9 May 2016 03:48:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.997 X-Spam-Level: * X-Spam-Status: No, score=1.997 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=ruckuswireless.onmicrosoft.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 0I8x-tuOCwxD; Mon, 9 May 2016 03:48:17 +0000 (UTC) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1on0076.outbound.protection.outlook.com [157.56.110.76]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 247215F23A; Mon, 9 May 2016 03:48:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=RUCKUSWIRELESS.onmicrosoft.com; s=selector1-ruckuswireless-com; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=juoreJopvIXa95+OC+WOue/fBIuTyKNw4YwvgMHcOtM=; b=mGF7b789gyH0bjKl7fTGhz9gtbohBUUUK/Ge4pDyOIwdOm8Oks+wu3V3/4YSLpSgRAwK+MLtvcvq/EXNW46UjxWvABFSsO51JmRDLW22JtOzolQHy2CigqF1995nTs5J/4OvWvZFbATbTsT/0QfSgzh3JMW47iUPDpUMIxqiDtc= Received: from CY1PR08MB1740.namprd08.prod.outlook.com (10.162.217.158) by CY1PR08MB1739.namprd08.prod.outlook.com (10.162.217.157) with Microsoft SMTP Server (TLS) id 15.1.497.3; Mon, 9 May 2016 03:48:09 +0000 Received: from CY1PR08MB1740.namprd08.prod.outlook.com ([10.162.217.158]) by CY1PR08MB1740.namprd08.prod.outlook.com ([10.162.217.158]) with mapi id 15.01.0497.009; Mon, 9 May 2016 03:48:09 +0000 From: Michael Fong To: "user@cassandra.apache.org" , "dev@cassandra.apache.org" Subject: RE: Cassandra 2.0.x OOM during startsup - schema version inconsistency after reboot Thread-Topic: Cassandra 2.0.x OOM during startsup - schema version inconsistency after reboot Thread-Index: AdGppIXIrVly31G+TtekD0mIWgN34g== Date: Mon, 9 May 2016 03:48:08 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: cassandra.apache.org; dkim=none (message not signed) header.d=none;cassandra.apache.org; dmarc=none action=none header.from=ruckuswireless.com; x-originating-ip: [59.124.251.135] x-ms-office365-filtering-correlation-id: 487cc72d-c397-4266-70ff-08d377bcbcd0 x-microsoft-exchange-diagnostics: 1;CY1PR08MB1739;5:m63ZrlnkGJulaRXOuiqgw5eNCviCE+inI16KmotOGxHro3oOFwEWy9m3l+wo63KnY0pElQR5VyniD/3rlTUptrFJjEqO+jGbzsISLbNu4x7qhJlLMbba7B6TLZVqleB3QcCVYSdrcr9wyiJDskwXvw==;24:dCYu4TeDFHeD1BK4lAInyxX9uhViIx/JWGYwOZDGCZo1PhTHiXoUqeMUri1HCBCsRosbKig7JP4GB49GAalq7xRM9YQrzOVYb/PSK/WWg14=;7:KYYYHHBgvgGTkxPdFLNYvhf79IfDDkXJhZcuBgt/6CodGkbvwn6cZlIS5JVy8FCyU+ORqQBLr+cztrx0oNKFnGQR9uUmkbv65iHeiD6Kq8AfVG5YlWdJuX86TbAhG45s/NAWB7qkxOiJYUDfjy/cs3Tvp2CVltWKKKMNx+622p9zJIj5iOrJk5hkA2/cBZ5r;20:HqEIcg8FZHUQvsf/hdKC941l07s/qkZoSWkzYgC87Za/AbzBytbE4VON+taTtfOHeLE5elHh0PO8EBPyjuF9quM5xSQ+JG0MftljOfST03EZ+UsCNR4cQ8MTMYRXfVrRg4fvSmB86xxcuNVyc1gBJCJzvi4SHZ2m7n6CBepYSOQ= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY1PR08MB1739; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6055026);SRVR:CY1PR08MB1739;BCL:0;PCL:0;RULEID:;SRVR:CY1PR08MB1739; x-forefront-prvs: 0937FB07C5 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(51874003)(377454003)(377424004)(9170700001)(15975445007)(9686002)(77096005)(2900100001)(189998001)(10400500002)(66066001)(5004730100002)(11100500001)(19300405004)(19580395003)(19580405001)(8936002)(3660700001)(81166005)(107886002)(102836003)(3280700002)(76576001)(5001770100001)(92566002)(586003)(5003600100002)(33656002)(16236675004)(19625215002)(6116002)(99286002)(86362001)(790700001)(5008740100001)(5002640100001)(2501003)(54356999)(122556002)(2906002)(450100001)(74316001)(87936001)(50986999)(1220700001);DIR:OUT;SFP:1101;SCL:1;SRVR:CY1PR08MB1739;H:CY1PR08MB1740.namprd08.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_CY1PR08MB1740E492ADE8829CEE787F76F5700CY1PR08MB1740namp_" MIME-Version: 1.0 X-OriginatorOrg: ruckuswireless.com X-MS-Exchange-CrossTenant-originalarrivaltime: 09 May 2016 03:48:08.9301 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 936dde22-c4d1-470f-b240-618bdde2e61f X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR08MB1739 archived-at: Mon, 09 May 2016 03:48:23 -0000 --_000_CY1PR08MB1740E492ADE8829CEE787F76F5700CY1PR08MB1740namp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, all, Haven't heard any responses so far, and this isue has troubled us for quite= some time. Here is another update: We have noticed several times that The schema version may change after migr= ation and reboot: Here is the scenario: 1. Two node cluster (1 & 2). 2. There are some schema changes, i.e. create a few new columnfamily.= The cluster will wait until both nodes have schema version in sync (descri= be cluster) before moving on. 3. Right before node2 is rebooted, the schema version is consistent; = however, after ndoe2 reboots and starts servicing, the MigrationManager wou= ld gossip different schema version. 4. Afterwards, both nodes starts exchanging schema message indefinit= ely until one of the node dies. We currently suspect the change of schema is due to replying the old entry = in commit log. We wish to continue dig further, but need experts help on th= is. I don't know if anyone has seen this before, or if there is anything wrong = with our migration flow though.. Thanks in advance. Best regards, Michael Fong From: Michael Fong [mailto:michael.fong@ruckuswireless.com] Sent: Thursday, April 21, 2016 6:41 PM To: user@cassandra.apache.org; dev@cassandra.apache.org Subject: RE: Cassandra 2.0.x OOM during bootstrap Hi, all, Here is some more information on before the OOM happened on the rebooted no= de in a 2-node test cluster: 1. It seems the schema version has changed on the rebooted node after= reboot, i.e. Before reboot, Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 MigrationManager.j= ava (line 328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a94= f58f Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 MigrationManager.j= ava (line 328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a94= f58f After rebooting node 2, Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 32= 8) Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b 2. After reboot, both nods repeatedly send MigrationTask to each othe= r - we suspect it is related to the schema version (Digest) mismatch after = Node 2 rebooted: The node2 keeps submitting the migration task over 100+ times to the other= node. INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node= /192.168.88.33 has restarted, now UP INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) = Updating topology for /192.168.88.33 INFO [GossipStage:1] 2016-04-19 11:18:18,263 StorageService.java (line 1544= ) Node /192.168.88.33 state jump to normal INFO [GossipStage:1] 2016-04-19 11:18:18,264 TokenMetadata.java (line 414) = Updating topology for /192.168.88.33 DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 1= 02) Submitting migration task for /192.168.88.33 DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 1= 02) Submitting migration task for /192.168.88.33 DEBUG [MigrationStage:1] 2016-04-19 11:18:18,268 MigrationTask.java (line 6= 2) Can't send schema pull request: node /192.168.88.33 is down. DEBUG [MigrationStage:1] 2016-04-19 11:18:18,268 MigrationTask.java (line 6= 2) Can't send schema pull request: node /192.168.88.33 is down. DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,353 Gossiper.java (line = 977) removing expire time for endpoint : /192.168.88.33 INFO [RequestResponseStage:1] 2016-04-19 11:18:18,353 Gossiper.java (line 9= 78) InetAddress /192.168.88.33 is now UP DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,353 MigrationManager.jav= a (line 102) Submitting migration task for /192.168.88.33 DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,355 Gossiper.java (line = 977) removing expire time for endpoint : /192.168.88.33 INFO [RequestResponseStage:1] 2016-04-19 11:18:18,355 Gossiper.java (line 9= 78) InetAddress /192.168.88.33 is now UP DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,355 MigrationManager.jav= a (line 102) Submitting migration task for /192.168.88.33 DEBUG [RequestResponseStage:2] 2016-04-19 11:18:18,355 Gossiper.java (line = 977) removing expire time for endpoint : /192.168.88.33 INFO [RequestResponseStage:2] 2016-04-19 11:18:18,355 Gossiper.java (line 9= 78) InetAddress /192.168.88.33 is now UP DEBUG [RequestResponseStage:2] 2016-04-19 11:18:18,356 MigrationManager.jav= a (line 102) Submitting migration task for /192.168.88.33 ..... On the otherhand, Node 1 keeps updating its gossip information, followed by= receiving and submitting migrationTask afterwards: DEBUG [RequestResponseStage:3] 2016-04-19 11:18:18,332 Gossiper.java (line = 977) removing expire time for endpoint : /192.168.88.34 INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line 9= 78) InetAddress /192.168.88.34 is now UP DEBUG [RequestResponseStage:4] 2016-04-19 11:18:18,335 Gossiper.java (line = 977) removing expire time for endpoint : /192.168.88.34 INFO [RequestResponseStage:4] 2016-04-19 11:18:18,335 Gossiper.java (line 9= 78) InetAddress /192.168.88.34 is now UP DEBUG [RequestResponseStage:3] 2016-04-19 11:18:18,335 Gossiper.java (line = 977) removing expire time for endpoint : /192.168.88.34 INFO [RequestResponseStage:3] 2016-04-19 11:18:18,335 Gossiper.java (line 9= 78) InetAddress /192.168.88.34 is now UP ...... DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 MigrationRequestVerbHandle= r.java (line 41) Received migration request from /192.168.88.34. DEBUG [MigrationStage:1] 2016-04-19 11:18:18,595 MigrationRequestVerbHandle= r.java (line 41) Received migration request from /192.168.88.34. DEBUG [MigrationStage:1] 2016-04-19 11:18:18,843 MigrationRequestVerbHandle= r.java (line 41) Received migration request from /192.168.88.34. DEBUG [MigrationStage:1] 2016-04-19 11:18:18,878 MigrationRequestVerbHandle= r.java (line 41) Received migration request from /192.168.88.34. ...... DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line= 127) submitting migration task for /192.168.88.34 DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line= 127) submitting migration task for /192.168.88.34 DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line= 127) submitting migration task for /192.168.88.34 ..... Has anyone experienced this scenario? Thanks in advanced! Sincerely, Michael Fong From: Michael Fong [mailto:michael.fong@ruckuswireless.com] Sent: Wednesday, April 20, 2016 10:43 AM To: user@cassandra.apache.org; dev@cassan= dra.apache.org Subject: Cassandra 2.0.x OOM during bootstrap Hi, all, We have recently encountered a Cassandra OOM issue when Cassandra is brough= t up sometimes (but not always) in our 4-node cluster test bed. After analyzing the heap dump, we could find the Internal-Response thread p= ool (JMXEnabledThreadPoolExecutor) is filled with thounds of 'org.apache.ca= ssandra.net.MessageIn' objects, and occupy > 2 gigabytes of heap memory. According to the documents on internet, it seems internal-response thread p= ool is somewhat related to schema-checking. Has anyone encountered similar = issue before? We are using Cassandra 2.0.17 and JDK 1.8. Thanks in advance! Sincerely, Michael Fong --_000_CY1PR08MB1740E492ADE8829CEE787F76F5700CY1PR08MB1740namp_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi, all,

 

 

Haven’t heard an= y responses so far, and this isue has troubled us for quite some time. Here= is another update:

 

We have noticed severa= l times that The schema version may change after migration and reboot:=

 

Here is the scenario:<= o:p>

1.&n= bsp;      Two node clust= er (1 & 2).

2.&n= bsp;      There are some= schema changes, i.e. create a few new columnfamily. The cluster will wait = until both nodes have schema version in sync (describe cluster) before movi= ng on.

3.&n= bsp;      Right before n= ode2 is rebooted, the schema version is consistent; however, after ndoe2 re= boots and starts servicing, the MigrationManager would gossip different sch= ema version.

4.&n= bsp;      Afterwards, bo= th nodes starts exchanging schema  message indefinitely until one of t= he node dies.

 

We currently suspect t= he change of schema is due to replying the old entry in commit log. We wish= to continue dig further, but need experts help on this.

 

I don’t know if = anyone has seen this before, or if there is anything wrong with our migrati= on flow though..

 

Thanks in advance.

 

Best regards,

 

 

Michael Fong

 

From: Michael = Fong [mailto:michael.fong@ruckuswireless.com]
Sent: Thursday, April 21, 2016 6:41 PM
To: user@cassandra.apache.org; dev@cassandra.apache.org
Subject: RE: Cassandra 2.0.x OOM during bootstrap
<= /p>

 

Hi, all,

 

Here is some more info= rmation on before the OOM happened on the rebooted node in a 2-node test cl= uster:

 

1.&n= bsp;      It seems the s= chema version has changed on the rebooted node after reboot, i.e.
Before reboot,
Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 MigrationManager.j= ava (line 328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a94= f58f
Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 MigrationManager.j= ava (line 328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a94= f58f

After rebooting= node 2,
Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 32= 8) Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b

 

2.&n= bsp;      After reboot, = both nods repeatedly send MigrationTask to each other – we suspect it= is related to the schema version (Digest) mismatch after Node 2 rebooted:<= o:p>

The node2  keeps = submitting the migration task over 100+ times to the other node.<= /p>

INFO [GossipStage:1] 2= 016-04-19 11:18:18,261 Gossiper.java (line 1011) Node /192.168.88.33 has re= started, now UP

INFO [GossipStage:1] 2= 016-04-19 11:18:18,262 TokenMetadata.java (line 414) Updating topology for = /192.168.88.33

INFO [GossipStage:1] 2= 016-04-19 11:18:18,263 StorageService.java (line 1544) Node /192.168.88.33 = state jump to normal

INFO [GossipStage:1] 2= 016-04-19 11:18:18,264 TokenMetadata.java (line 414) Updating topology for = /192.168.88.33

DEBUG [GossipStage:1] = 2016-04-19 11:18:18,265 MigrationManager.java (line 102) Submitting migrati= on task for /192.168.88.33

DEBUG [GossipStage:1] = 2016-04-19 11:18:18,265 MigrationManager.java (line 102) Submitting migrati= on task for /192.168.88.33

DEBUG [MigrationStage:= 1] 2016-04-19 11:18:18,268 MigrationTask.java (line 62) Can't send schema p= ull request: node /192.168.88.33 is down.

DEBUG [MigrationStage:= 1] 2016-04-19 11:18:18,268 MigrationTask.java (line 62) Can't send schema p= ull request: node /192.168.88.33 is down.

DEBUG [RequestResponse= Stage:1] 2016-04-19 11:18:18,353 Gossiper.java (line 977) removing expire t= ime for endpoint : /192.168.88.33

INFO [RequestResponseS= tage:1] 2016-04-19 11:18:18,353 Gossiper.java (line 978) InetAddress /192.1= 68.88.33 is now UP

DEBUG [RequestResponse= Stage:1] 2016-04-19 11:18:18,353 MigrationManager.java (line 102) Submittin= g migration task for /192.168.88.33

DEBUG [RequestResponse= Stage:1] 2016-04-19 11:18:18,355 Gossiper.java (line 977) removing expire t= ime for endpoint : /192.168.88.33

INFO [RequestResponseS= tage:1] 2016-04-19 11:18:18,355 Gossiper.java (line 978) InetAddress /192.1= 68.88.33 is now UP

DEBUG [RequestResponse= Stage:1] 2016-04-19 11:18:18,355 MigrationManager.java (line 102) Submittin= g migration task for /192.168.88.33

DEBUG [RequestResponse= Stage:2] 2016-04-19 11:18:18,355 Gossiper.java (line 977) removing expire t= ime for endpoint : /192.168.88.33

INFO [RequestResponseS= tage:2] 2016-04-19 11:18:18,355 Gossiper.java (line 978) InetAddress /192.1= 68.88.33 is now UP

DEBUG [RequestResponse= Stage:2] 2016-04-19 11:18:18,356 MigrationManager.java (line 102) Submittin= g migration task for /192.168.88.33

…..

 

 

On the otherhand, Node= 1 keeps updating its gossip information, followed by receiving and submitt= ing migrationTask afterwards:

DEBUG [RequestResponse= Stage:3] 2016-04-19 11:18:18,332 Gossiper.java (line 977) removing expire t= ime for endpoint : /192.168.88.34

INFO [RequestResponseS= tage:3] 2016-04-19 11:18:18,333 Gossiper.java (line 978) InetAddress /192.1= 68.88.34 is now UP

DEBUG [RequestResponse= Stage:4] 2016-04-19 11:18:18,335 Gossiper.java (line 977) removing expire t= ime for endpoint : /192.168.88.34

INFO [RequestResponseS= tage:4] 2016-04-19 11:18:18,335 Gossiper.java (line 978) InetAddress /192.1= 68.88.34 is now UP

DEBUG [RequestResponse= Stage:3] 2016-04-19 11:18:18,335 Gossiper.java (line 977) removing expire t= ime for endpoint : /192.168.88.34

INFO [RequestResponseS= tage:3] 2016-04-19 11:18:18,335 Gossiper.java (line 978) InetAddress /192.1= 68.88.34 is now UP

……

DEBUG [MigrationStage:= 1] 2016-04-19 11:18:18,496 MigrationRequestVerbHandler.java (line 41) Recei= ved migration request from /192.168.88.34.

DEBUG [MigrationStage:= 1] 2016-04-19 11:18:18,595 MigrationRequestVerbHandler.java (line 41) Recei= ved migration request from /192.168.88.34.

DEBUG [MigrationStage:= 1] 2016-04-19 11:18:18,843 MigrationRequestVerbHandler.java (line 41) Recei= ved migration request from /192.168.88.34.

DEBUG [MigrationStage:= 1] 2016-04-19 11:18:18,878 MigrationRequestVerbHandler.java (line 41) Recei= ved migration request from /192.168.88.34.

……

DEBUG [OptionalTasks:1= ] 2016-04-19 11:19:18,337 MigrationManager.java (line 127) submitting migra= tion task for /192.168.88.34

DEBUG [OptionalTasks:1= ] 2016-04-19 11:19:18,337 MigrationManager.java (line 127) submitting migra= tion task for /192.168.88.34

DEBUG [OptionalTasks:1= ] 2016-04-19 11:19:18,337 MigrationManager.java (line 127) submitting migra= tion task for /192.168.88.34

.....

 

Has anyone experienced= this scenario? Thanks in advanced!

 

Sincerely,<= /span>

 

Michael Fong

 

From: Michael = Fong [mailto:michael.fon= g@ruckuswireless.com]
Sent: Wednesday, April 20, 2016 10:43 AM
To: user@cassandra.apac= he.org; dev@cassandra.apache.org Subject: Cassandra 2.0.x OOM during bootstrap

 

Hi, all,

 

We have recently encountered a Cassandra OOM issue w= hen Cassandra is brought up sometimes (but not always) in our 4-node cluste= r test bed.

 

After analyzing the heap dump, we could find the Int= ernal-Response thread pool (JMXEnabledThreadPoolExecutor) is filled with th= ounds of ‘org.apache.cassandra.net.MessageIn’ objects, and occu= py > 2 gigabytes of heap memory.

 

According to the documents on internet, it seems int= ernal-response thread pool is somewhat related to schema-checking. Has anyo= ne encountered similar issue before?

 

We are using Cassandra 2.0.17 and JDK 1.8. Thanks in= advance!

 

Sincerely,

 

Michael Fong

--_000_CY1PR08MB1740E492ADE8829CEE787F76F5700CY1PR08MB1740namp_--