Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8A3D6187CE for ; Fri, 19 Feb 2016 23:01:28 +0000 (UTC) Received: (qmail 29467 invoked by uid 500); 19 Feb 2016 23:01:25 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 29426 invoked by uid 500); 19 Feb 2016 23:01:25 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 29416 invoked by uid 99); 19 Feb 2016 23:01:25 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Feb 2016 23:01:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A475A1A1087 for ; Fri, 19 Feb 2016 23:01:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.35 X-Spam-Level: * X-Spam-Status: No, score=1.35 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_REPLYTO_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.329, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id ttXU2o2xED_t for ; Fri, 19 Feb 2016 23:01:22 +0000 (UTC) Received: from nm21-vm6.bullet.mail.ne1.yahoo.com (nm21-vm6.bullet.mail.ne1.yahoo.com [98.138.91.114]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 2C74A5F1D4 for ; Fri, 19 Feb 2016 23:01:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1455922873; bh=iL3RzI5MqwFq+QZ1wKBd58lnbXQoaXP6m3KB37zfWsQ=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject; b=fxvX2Djj0kSrIFZeZnDyMAOIsB8dCrGFXXc8juvyfKAGEJSSBAGEteDRRDzak48kN9E5DSUPYyoQ5LI7X3HFNB1HCZgbJZIuir/fChxy5RpuyBKxS2apaIDuxbdspdoaRHZFkSU6sxFeKwfLfQqvqXrhdEZG00F8fGKdWgchwiCP1EMxbjyWlqK9xU1nowXndb6zlOxfWW2wvn8P0zsClAuVGbWJYvWZtFhXXmJAyoBSe7MoFVkXBtRdmx9F4cc7WjLZ+INtEf7VLC8ACsy3sLOrRpEZ+1j+xsy/Mbg5uI9aCyJ4RTKXMgl0foO2RGNOD9XSsbqcGn6wt01HHRf/BQ== Received: from [98.138.226.177] by nm21.bullet.mail.ne1.yahoo.com with NNFMP; 19 Feb 2016 23:01:13 -0000 Received: from [98.138.88.239] by tm12.bullet.mail.ne1.yahoo.com with NNFMP; 19 Feb 2016 23:01:13 -0000 Received: from [127.0.0.1] by omp1039.mail.ne1.yahoo.com with NNFMP; 19 Feb 2016 23:01:13 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 758113.70893.bm@omp1039.mail.ne1.yahoo.com X-YMail-OSG: pOTCERsVM1nSm1SQXtLSaWsk5C5YLAW38YsiyKX4ipDhebWMNvY8gDbZlvLkRTJ kSh.LX2JMv.wtQF67kHhDNj_EQj4u9KFNUPdD1GfS0MiygEixLV9FGx9ouL_hXpixX6D0U0EQfSx xz7MoOUmm.l8mDAbq5BSsmJf0yZ8uKAg9p6lChBMqP_qquDlp5SOffeTLCwMaElds8q9AmQdJZI6 RXEcCz9bgZOhUV_oWCLovB0dmzW90_513KhsUjwm0R6lSpJqjquDf6KQuo4uH89uxiLZsq2zaCpR .vw3DKUfuEWmYZ.PxdkqB9HiCC9dpSvj1QKWIg.mOxNXGXDt4Z6Mz36vbPjglqcyD295W3slkhAH yEGx4j0MKYw4gfPkyHz1CkCkFuW45a.BPtRYW7Jyd9GNllq1CbPJRz8XxYqc66xHDeuhU1sVkTGa t_7lZJcMUQJcj.OdgE5BlIDGRzRmXnwVVb4CP628HCmj9U.8dpZdRFpHDKo8E6RhlVCC1f3LUc1w s_YP3MDI2SrrL1lr73thWHr7DKC6Wza4tNw-- Received: by 98.138.101.169; Fri, 19 Feb 2016 23:01:13 +0000 Date: Fri, 19 Feb 2016 23:01:12 +0000 (UTC) From: Sotirios Delimanolis Reply-To: Sotirios Delimanolis To: daemeon reiydelle , Cassandra Mailing List Message-ID: <1319131001.89234.1455922872927.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: References: <1242549769.5244726.1455847648582.JavaMail.yahoo.ref@mail.yahoo.com> <1242549769.5244726.1455847648582.JavaMail.yahoo@mail.yahoo.com> <794486693.5503516.1455905565312.JavaMail.yahoo@mail.yahoo.com> Subject: Re: Live upgrade 2.0 to 2.1 temporarily increases GC time causing timeouts and unavailability MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_89233_1122362404.1455922872921" ------=_Part_89233_1122362404.1455922872921 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable We're not all the way there yet with native. But the increased GC time is t= emporary, only during the deployment. After all nodes are on 2.1, everythin= g is smooth.=20 On Friday, February 19, 2016 1:47 PM, daemeon reiydelle wrote: =20 FYI, my observations were with native, not thrift. ....... Daemeon C.M. Reiydelle USA (+1) 415.501.0198 London (+44) (0) 20 8144 9872 On Fri, Feb 19, 2016 at 10:12 AM, Sotirios Delimanolis wrote: Does your cluster contain 24+ nodes or fewer?=C2=A0 We did the same upgrade on a smaller cluster of 5 nodes and we didn't see t= his behavior. On the 24 node cluster, the timeouts only took effect once ~5= -6-7+ nodes had been upgraded. We're doing some more upgrades next week, trying different deployment plans= . I'll report back with the results. Thanks for the reply (we absolutely want to move to CQL)=20 On Friday, February 19, 2016 1:10 AM, Alain RODRIGUEZ wrote: =20 I performed this exact update a few days ago, excepted clients were using = native protocol and it wen smoothly. So I think this might be thrift relate= d. No idea what is producing this though, just wanted to give the info fwiw= . As a side note, unrelated to the issue, performances using native are a lot= better than thrift starting in C* 2.1. Drivers using native are also more = modern allowing you to do very interesting stuff. Updating to native now th= at you are using 2.1 is something you might want to do soon enough :-). C*heers,-----------------Alain RodriguezFrance The Last Picklehttp://www.thelastpickle.com 2016-02-19 3:07 GMT+01:00 Sotirios Delimanolis : We have a Cassandra cluster with 24 nodes. These nodes were running 2.0.16.= =C2=A0 While the nodes are in the ring and handling queries, we perform the upgrad= e to 2.1.12 as follows (more or less) one node at a time: =20 - Stop the Cassandra process - Deploy=C2=A0jars,=C2=A0scripts, binaries, etc. - Start the Cassandra process A few nodes into the upgrade, we start noticing that the majority of querie= s (mostly through Thrift) time out or report unavailable. Looking at system= information, Cassandra GC time goes through the roof, which is what we ass= ume causes the time outs. Once all nodes are upgraded, the cluster stabilizes and no more (barely any= ) time outs occur.=C2=A0 What could explain this? Does it have anything to do with how a 2.0 communi= cates with a 2.1? Our Cassandra consumers haven't changed. =20 ------=_Part_89233_1122362404.1455922872921 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
We're not all the way there yet with native= . But the increased GC time is temporary, only during the deployment. After= all nodes are on 2.1, everything is smooth.


On Friday, February 19,= 2016 1:47 PM, daemeon reiydelle <daemeonr@gmail.com> wrote:


FYI, my observati= ons were with native, not thrift.


.......


Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872<= /span>

<= /div>

On Fri, Feb 19, 2016= at 10:12 AM, Sotirios Delimanolis <sotodel_89@yahoo.com> wr= ote:
Does your cluster c= ontain 24+ nodes or fewer? 

We did the same upgrade on a smaller cluster of 5 nodes and we di= dn't see this behavior. On the 24 node cluster, the timeouts only took effe= ct once ~5-6-7+ nodes had been upgraded.

We're doing some more upgrades next week, try= ing different deployment plans. I'll report back with the results.

Thanks for the repl= y (we absolutely want to move to CQL)


On Friday, February 19, 2016 1:10 AM, Alain RODRIGUEZ <arodrime@gmail.com> = wrote:


I performed this exact update a few day= s ago, excepted clients were using native protocol and it wen smoothly. So = I think this might be thrift related. No idea what is producing this though= , just wanted to give the info fwiw.

As a= side note, unrelated to the issue, performances using native are a lot bet= ter than thrift starting in C* 2.1. Drivers using native are also more mode= rn allowing you to do very interesting stuff. Updating to native now that y= ou are using 2.1 is something you might want to do soon enough :-).

C*heers,
-------------= ----
Alain Rodriguez
France

The Last Pickle

2016= -02-19 3:07 GMT+01:00 Sotirios Delimanolis <sotodel_89@yahoo.com>:
We have a Cassandra cluster with 24 node= s. These nodes were running 2.0.16. 

While the nodes are in the ring and handli= ng queries, we perform the upgrade to 2.1.12 as follows (more or less) one = node at a time:

  1. Stop the Cassandra process
  2. Deploy jars, scripts, = binaries, etc.
  3. Start the Cassandra process

A few nodes into the upgrade, we start noticing that t= he majority of queries (mostly through Thrift) time out or report unavailab= le. Looking at system information, Cassandra GC time goes through the roof,= which is what we assume causes the time outs.

Once all nodes are upgraded, the cluster stabilizes and no more= (barely any) time outs occur. 

What could explain this? Does it have anything to do with how a 2.0 commu= nicates with a 2.1?

Our Cassandra c= onsumers haven't changed.








=



------=_Part_89233_1122362404.1455922872921--