Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
From: "Peddi, Praveen" <peddi@amazon.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Removing Node causes bunch of HostUnavailableException
Thread-Topic: Removing Node causes bunch of HostUnavailableException
Thread-Index: 
 AQHRdJ4Ud3J0GGm5J0uTFwkVPZDx8p9GkPcA///D0QCAAG/KAIABIiiAgABXEoD//6+mAIAA1fqAgACJ9gCACTkOAIAIJbAA
Date: Tue, 15 Mar 2016 17:55:15 +0000
Message-ID: <D30DC305.3CFCD%peddi@amazon.com>
References: <D2FC7AB6.3B54A%peddi@amazon.com>
 <CAEDUwd0YAvX3MrjKc-EAy9kTLr7Yyai=LFokKFfFQESF7Qy4Mg@mail.gmail.com>
 <D2FCBE95.3B5F3%peddi@amazon.com>
 <CA+VSrLrqCyKakELoG8tGVGB2h-5_G4yr36ezcf-kpjxEw8F-Cg@mail.gmail.com>
 <D2FDC2DF.3B760%peddi@amazon.com>
 <CAOxAL63pgamxGPgLoVBr5BCOQy_3SjCGxvS9w8iD8FAL0oW2dQ@mail.gmail.com>
 <D2FE1807.3BAC4%peddi@amazon.com>
 <CAOxAL63=W1VX5Wrt406bWHLj55x7OC+j=ryGyhbbHgxNqRivuw@mail.gmail.com>
 <D2FF2C26.3BB9F%peddi@amazon.com>
 <CA+VSrLpXDMiT4B36SXPutRsG33KA66UzQ=3CZjrV6tUWr03wjQ@mail.gmail.com>
In-Reply-To: 
 <CA+VSrLpXDMiT4B36SXPutRsG33KA66UzQ=3CZjrV6tUWr03wjQ@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.43.60.113]
Content-Type: multipart/alternative;
	boundary="_000_D30DC3053CFCDpeddiamazoncom_"
MIME-Version: 1.0

--_000_D30DC3053CFCDpeddiamazoncom_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

Hi Alain,
Sorry I completely missed your email until my colleague pointed it out.

>From the testing we have done so far, We still have this issue when removin=
g nodes on 2.0.9 but not on 2.2.4. We will be upgrading to 2.2.4 pretty soo=
n so I am not too worried about errors in 2.0.9. Since we haven=92t changed=
 any code on our side between 2.0.9 and 2.2.4, I am guessing this is probab=
ly a bug in 2.0.9 and since 2.0.9 is not supported anymore, I didn=92t crea=
te a jira ticket for it.

Thanks for following up though.

Praveen


From: Alain RODRIGUEZ <arodrime@gmail.com<mailto:arodrime@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <us=
er@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Thursday, March 10, 2016 at 5:30 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cas=
sandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Removing Node causes bunch of HostUnavailableException

Hi Praveen, how is this going ?

I have been out for a while, did you manage to remove the nodes ? Do you ne=
ed more help ? If so, I could use a status update and more information abou=
t the remaining issues.

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com<mailto:alain@thelastpickle.com>
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-04 19:39 GMT+01:00 Peddi, Praveen <peddi@amazon.com<mailto:peddi@am=
azon.com>>:
Hi Jack,
My answers below=85

What is the exact exception you are getting and where do you get it? Is it =
UnavailableException or NoHostAvailableException and does it occur on the c=
lient, using the Java driver?
We saw different types of exceptions. One I could quickly grep are:
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeou=
t during write query at consistency SERIAL (2 replica were required but onl=
y 1 acknowledged the write)
com.datastax.driver.core.exceptions.UnavailableException: Not enough replic=
a available for query at consistency QUORUM (2 required but only 1 alive)
QueryTimeoutException


What is your LoadBalancingPolicy?

new TokenAwarePolicy(new RoundRobinPolicy()))

What consistency level is the client using?
QUORUM for reads. For writes some APIs use SERIAL and some use QUORUM depen=
dending if we want to do optimistic locking,

What retry policy is the client using?
Default Retry Policy


When you say that the failures don't last for more than a few minutes, you =
mean from the moment you perform the nodetool removenode? And is operation =
completely normal after those few minutes?
That is correct. All operations recover from failures after few minutes.


-- Jack Krupansky

On Thu, Mar 3, 2016 at 4:40 PM, Peddi, Praveen <peddi@amazon.com<mailto:ped=
di@amazon.com>> wrote:
Hi Jack,
Which node(s) were getting the HostNotAvailable errors - all nodes for ever=
y query, or just a small portion of the nodes on some queries?
Not all read/writes are failing with Unavalable or Timeout exception. Write=
s failures were around 10% of total calls. Reads were little worse (as wors=
e as 35% of total calls).


It may take some time for the gossip state to propagate; maybe some of it i=
s corrupted or needs a full refresh.

Were any of the seed nodes in the collection of nodes that were removed? Ho=
w many seed nodes does each node typically have?
We currently use all hosts as seed hosts which I know is a very bad idea an=
d we are going to fix that soon. The reason we use all hosts as seed hosts =
is because these hosts can get recycled for many reasons and we didn=92t wa=
nt to hard code the host names so we programmatically get host names (we wr=
ote our own seed host provider). Could that be the reason for these failure=
s? If a dead node is in the seed nodes list and we try to remove that node,=
 could that lead to blip of failures. The failures don=92t last for more th=
an few minutes.


-- Jack Krupansky

On Thu, Mar 3, 2016 at 4:16 PM, Peddi, Praveen <peddi@amazon.com<mailto:ped=
di@amazon.com>> wrote:
Thanks Alain for quick and detailed response. My answers inline. One thing =
I want to clarify is, the nodes got recycled due to some automatic health c=
heck failure. This means old nodes are dead and new nodes got added w/o our=
 intervention. So replacing nodes would not work for us since the new nodes=
 were already added.


We are not removing multiple nodes at the same time. All dead nodes are fro=
m same AZ so there were no errors when the nodes were down as expected (bec=
ause we use QUORUM)

Do you use at leat 3 distinct AZ ? If so, you should indeed be fine regardi=
ng data integrity. Also repair should then work for you. If you have less t=
han 3 AZ, then you are in troubles...
Yes we use 3 distinct AZs and replicate to all 3 Azs which is why when 8 no=
des were recycled, there were absolutely no outage on Cassandra (other two =
nodes wtill satisfy quorum consistency)

About the unreachable errors, I believe it can be due to the overload due t=
o the missing nodes. Pressure on the remaining node might be too strong.
It is certainly possible but we have beefed up cluster with <3% CPU, hardly=
 any network I/o and disk usage. We have 162 nodes in the cluster and each =
node doesn=92t have more than 80 to 100MB of data.


However, As soon as I started removing nodes one by one, every time time we=
 see lot of timeout and unavailable exceptions which doesn=92t make any sen=
se because I am just removing a node that doesn=92t even exist.

This probably added even more load, if you are using vnodes, all the remain=
ing nodes probably started streaming data to each other node at the speed o=
f "nodetool getstreamthroughput". AWS network isn't that good, and is proba=
bly saturated. Also have you the phi_convict_threshold configured to a high=
 value at least 10 or 12 ? This would avoid nodes to be marked down that of=
ten.
We are using c3.2xlarge which has good network throughput (1GB/sec I think)=
. We are using default value which is 200MB/sec in 2.0.9. We will play with=
 it in future and see if this could make any difference but as I mentioned =
the data size on each node is not huge.
Regarding phi_convict_threshold, our Cassandra is not bringing itself down.=
 There was a bug in health check from one of our internal tool and that too=
l is recycling the nodes. Nothing to do with Cassandra health. Again we wil=
l keep an eye on it in future.


What does "nodetool tpstats" outputs ?
Nodetool tpstats on which node? Any node?


Also you might try to monitor resources and see what happens (my guess is y=
ou should focus at iowait, disk usage and network, have an eye at cpu too).
We did monitor cpu, disk and network and they are all very low.


A quick fix would probably be to hardly throttle the network on all the nod=
es and see if it helps:

nodetool setstreamthroughput 2
We will play with this config. 2.0.9 defaults to 200MB/sec which I think is=
 too high.


If this work, you could incrementally increase it and monitor, find the goo=
d tuning and put it the cassandra.yaml.

I opened a ticket a while ago about that issue: https://issues.apache.org/j=
ira/browse/CASSANDRA-9509
I voted for this issue. Lets see if it gets picked up :).


I hope this will help you to go back to a healthy state allowing you a fast=
 upgrade ;-).

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com<mailto:alain@thelastpickle.com>
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-02 22:17 GMT+01:00 Peddi, Praveen <peddi@amazon.com<mailto:peddi@am=
azon.com>>:
Hi Robert,
Thanks for your response.

Replication factor is 3.

We are in the process of upgrading to 2.2.4. We have had too many performan=
ce issues with later versions of Cassandra (I have asked asked for help rel=
ated to that in the forum). We are close to getting to similar performance =
now and hopefully upgrade in next few weeks. Lot of testing to do :(.

We are not removing multiple nodes at the same time. All dead nodes are fro=
m same AZ so there were no errors when the nodes were down as expected (bec=
ause we use QUORUM). However, As soon as I started removing nodes one by on=
e, every time time we see lot of timeout and unavailable exceptions which d=
oesn=92t make any sense because I am just removing a node that doesn=92t ev=
en exist.


From: Robert Coli <rcoli@eventbrite.com<mailto:rcoli@eventbrite.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <us=
er@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, March 2, 2016 at 2:52 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cas=
sandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Removing Node causes bunch of HostUnavailableException

On Wed, Mar 2, 2016 at 8:10 AM, Peddi, Praveen <peddi@amazon.com<mailto:ped=
di@amazon.com>> wrote:
We have few dead nodes in the cluster (Amazon ASG removed those thinking th=
ere is an issue with health). Now we are trying to remove those dead nodes =
from the cluster so that other nodes can take over. As soon as I execute no=
detool removenode <ID>, we see lots of HostUnavailableExceptions both on re=
ads and writes. What I am not able to understand is, these are deadnodes an=
d don=92t even physically exists. Why would removenode command cause any ou=
tage of nodes in Cassandra when we had no errors whatsoever before removing=
 them. I could not really find a jira ticket for this.

What is your replication factor?

Also, 2.0.9 is meaningfully old at this point, consider upgrading ASAP.

Also, removing multiple nodes with removenode means your consistency is pre=
tty hosed. Repair ASAP, but there are potential cases where repair won't he=
lp.

=3DRob


=3DRob


--_000_D30DC3053CFCDpeddiamazoncom_
Content-Type: text/html; charset="Windows-1252"
Content-ID: <F21F03858ACE8649875CB39EC0B3C71E@ant.amazon.com>
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1=
252">
</head>
<body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-lin=
e-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-fami=
ly: Calibri, sans-serif;">
<div>
<div>Hi Alain,</div>
<div>Sorry I completely missed your email until my colleague pointed it out=
.</div>
<div><br>
</div>
<div>From the testing we have done so far, We still have this issue when re=
moving nodes on 2.0.9 but not on 2.2.4. We will be upgrading to 2.2.4 prett=
y soon so I am not too worried about errors in 2.0.9. Since we haven=92t ch=
anged any code on our side between
 2.0.9 and 2.2.4, I am guessing this is probably a bug in 2.0.9 and since 2=
.0.9 is not supported anymore, I didn=92t create a jira ticket for it.</div=
>
<div><br>
</div>
<div>Thanks for following up though.</div>
<div><br>
</div>
<div>Praveen</div>
<div>
<table class=3D"MsoNormalTable" border=3D"1" cellspacing=3D"0" cellpadding=
=3D"0" style=3D"color: rgb(0, 0, 0); font-size: 14px; margin-left: -5.4pt; =
border-collapse: collapse; border: none;">
<tbody>
<tr>
<td width=3D"469" colspan=3D"2" valign=3D"top" style=3D"width:468.6pt;borde=
r:none;
  padding:0in 5.4pt 0in 5.4pt">
<br>
</td>
</tr>
<tr>
<td width=3D"469" colspan=3D"2" style=3D"width:468.6pt;border:none;padding:=
0in 5.4pt 0in 5.4pt">
<br>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div><br>
</div>
<span id=3D"OLK_SRC_BODY_SECTION">
<div style=3D"font-family:Calibri; font-size:11pt; text-align:left; color:b=
lack; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM:=
 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid;=
 BORDER-RIGHT: medium none; PADDING-TOP: 3pt">
<span style=3D"font-weight:bold">From: </span>Alain RODRIGUEZ &lt;<a href=
=3D"mailto:arodrime@gmail.com">arodrime@gmail.com</a>&gt;<br>
<span style=3D"font-weight:bold">Reply-To: </span>&quot;<a href=3D"mailto:u=
ser@cassandra.apache.org">user@cassandra.apache.org</a>&quot; &lt;<a href=
=3D"mailto:user@cassandra.apache.org">user@cassandra.apache.org</a>&gt;<br>
<span style=3D"font-weight:bold">Date: </span>Thursday, March 10, 2016 at 5=
:30 AM<br>
<span style=3D"font-weight:bold">To: </span>&quot;<a href=3D"mailto:user@ca=
ssandra.apache.org">user@cassandra.apache.org</a>&quot; &lt;<a href=3D"mail=
to:user@cassandra.apache.org">user@cassandra.apache.org</a>&gt;<br>
<span style=3D"font-weight:bold">Subject: </span>Re: Removing Node causes b=
unch of HostUnavailableException<br>
</div>
<div><br>
</div>
<blockquote id=3D"MAC_OUTLOOK_ATTRIBUTION_BLOCKQUOTE" style=3D"BORDER-LEFT:=
 #b5c4df 5 solid; PADDING:0 0 0 5; MARGIN:0 0 0 5;">
<div>
<div>
<div dir=3D"ltr">Hi Praveen, how is this going ?
<div><br>
</div>
<div>I have been out for a while, did you manage to remove the nodes ? Do y=
ou need more help ? If so, I could use a status update and more information=
 about the remaining issues.</div>
<div><br>
</div>
<div>C*heers,</div>
<div>
<div>-----------------------</div>
<div>Alain Rodriguez - <a href=3D"mailto:alain@thelastpickle.com">alain@the=
lastpickle.com</a></div>
<div>France</div>
<div><br>
</div>
<div>The Last Pickle - Apache Cassandra Consulting</div>
<div><a href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com<=
/a></div>
</div>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">2016-03-04 19:39 GMT&#43;01:00 Peddi, Praveen <s=
pan dir=3D"ltr">
&lt;<a href=3D"mailto:peddi@amazon.com" target=3D"_blank">peddi@amazon.com<=
/a>&gt;</span>:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
<div style=3D"word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-fam=
ily:Calibri,sans-serif">
<div>
<div>
<div>Hi Jack,</div>
</div>
</div>
<div>My answers below=85</div>
<span>
<div><br>
</div>
<span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">What is the exact exception you are getting and where do y=
ou get it? Is it UnavailableException or NoHostAvailableException&nbsp;and =
does it occur on the client, using the Java driver?</div>
</div>
</div>
</blockquote>
</span></span>
<div>We saw different types of exceptions. One I could quickly grep are:</d=
iv>
<div>com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra t=
imeout during write query at consistency SERIAL (2 replica were required bu=
t only 1 acknowledged the write)</div>
<div>com.datastax.driver.core.exceptions.UnavailableException: Not enough r=
eplica available for query at consistency QUORUM (2 required but only 1 ali=
ve)</div>
<div>QueryTimeoutException</div>
<div><br>
</div>
<span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr"><br>
<div><br>
</div>
<div>What is your LoadBalancingPolicy?</div>
</div>
</div>
</div>
</blockquote>
</span>
<div>
<pre><span>new</span> <span>TokenAwarePolicy</span><span>(</span><span>new<=
/span> <span>RoundRobinPolicy</span><span>()))</span></pre>
</div>
<span><span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">
<div><br>
</div>
<div>What consistency level is the client using?</div>
</div>
</div>
</div>
</blockquote>
</span></span>
<div>QUORUM for reads. For writes some APIs use SERIAL and some use QUORUM =
dependending if we want to do optimistic locking,</div>
<span><span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">
<div><br>
</div>
<div>What retry policy is the client using?</div>
</div>
</div>
</div>
</blockquote>
</span></span>
<div>Default Retry Policy</div>
<span>
<div><br>
</div>
<span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">
<div><br>
</div>
<div>When you say that the failures don't last for more than a few minutes,=
 you mean from the moment you perform the nodetool removenode? And is opera=
tion completely normal after those few minutes?</div>
</div>
</div>
</div>
</blockquote>
</span></span>
<div>That is correct. All operations recover from failures after few minute=
s.</div>
<div>
<div>
<div><br>
</div>
<span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">
<div><br>
</div>
</div>
<div class=3D"gmail_extra"><br clear=3D"all">
<div>
<div>
<div dir=3D"ltr">-- Jack Krupansky</div>
</div>
</div>
<br>
<div class=3D"gmail_quote">On Thu, Mar 3, 2016 at 4:40 PM, Peddi, Praveen <=
span dir=3D"ltr">
&lt;<a href=3D"mailto:peddi@amazon.com" target=3D"_blank">peddi@amazon.com<=
/a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
<div style=3D"word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-fam=
ily:Calibri,sans-serif">
<div>
<div>Hi Jack,</div>
</div>
<span><span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">Which node(s) were getting the HostNotAvailable errors - a=
ll nodes for every query, or just a small portion of the nodes on some quer=
ies?</div>
</div>
</div>
</blockquote>
</span></span>
<div>Not all read/writes are failing with Unavalable or Timeout exception. =
Writes failures were around 10% of total calls. Reads were little worse (as=
 worse as 35% of total calls).</div>
<span>
<div><br>
</div>
<span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">
<div><br>
</div>
<div>It may take some time for the gossip state to propagate; maybe some of=
 it is corrupted or needs a full refresh.</div>
<div><br>
</div>
<div>Were any of the seed nodes in the collection of nodes that were remove=
d? How many seed nodes does each node typically have?</div>
</div>
</div>
</div>
</blockquote>
</span></span>
<div>We currently use all hosts as seed hosts which I know is a very bad id=
ea and we are going to fix that soon. The reason we use all hosts as seed h=
osts is because these hosts can get recycled for many reasons and we didn=
=92t want to hard code the host names
 so we programmatically get host names (we wrote our own seed host provider=
). Could that be the reason for these failures? If a dead node is in the se=
ed nodes list and we try to remove that node, could that lead to blip of fa=
ilures. The failures don=92t last
 for more than few minutes.</div>
<div>
<div><span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">
<div><br>
</div>
</div>
<div class=3D"gmail_extra"><br clear=3D"all">
<div>
<div>
<div dir=3D"ltr">-- Jack Krupansky</div>
</div>
</div>
<br>
<div class=3D"gmail_quote">On Thu, Mar 3, 2016 at 4:16 PM, Peddi, Praveen <=
span dir=3D"ltr">
&lt;<a href=3D"mailto:peddi@amazon.com" target=3D"_blank">peddi@amazon.com<=
/a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
<div style=3D"word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-fam=
ily:Calibri,sans-serif">
<div>Thanks Alain for quick and detailed response. My answers inline. One t=
hing I want to clarify is, the nodes got recycled due to some automatic hea=
lth check failure. This means old nodes are dead and new nodes got added w/=
o our intervention. So replacing
 nodes would not work for us since the new nodes were already added.</div>
<span>
<div><br>
</div>
<span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">
<div>&nbsp;</div>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
<span style=3D"color: rgb(0, 0, 0); font-family: Calibri, sans-serif; font-=
size: 14px;">We are not removing multiple nodes at the same time. All dead =
nodes are from same AZ so there were no errors when the nodes were down as =
expected (because we use QUORUM)</span></blockquote>
<div><br>
</div>
<div>Do you use at leat 3 distinct AZ ? If so, you should indeed be fine re=
garding data integrity. Also repair should then work for you. If you have l=
ess than 3 AZ, then you are in troubles...</div>
</div>
</div>
</div>
</blockquote>
</span></span>
<div>Yes we use 3 distinct AZs and replicate to all 3 Azs which is why when=
 8 nodes were recycled, there were absolutely no outage on Cassandra (other=
 two nodes wtill satisfy quorum consistency)</div>
<span><span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">
<div><br>
</div>
<div>About the unreachable errors, I believe it can be due to the overload =
due to the missing nodes. Pressure on the remaining node might be too stron=
g.</div>
</div>
</div>
</div>
</blockquote>
</span></span>
<div>It is certainly possible but we have beefed up cluster with &lt;3% CPU=
, hardly any network I/o and disk usage. We have 162 nodes in the cluster a=
nd each node doesn=92t have more than 80 to 100MB of data.</div>
<span><span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">
<div><br>
</div>
<div><br>
</div>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
<div style=3D"color:rgb(0,0,0);font-family:Calibri,sans-serif;font-size:14p=
x">However, As soon as I started removing nodes one by one, every time time=
 we see lot of timeout and unavailable exceptions which doesn=92t make any =
sense because I am just removing a node
 that doesn=92t even exist.</div>
</blockquote>
<div><br>
</div>
<div>This probably added even more load, if you are using vnodes, all the r=
emaining nodes probably started streaming data to each other node at the sp=
eed of &quot;nodetool getstreamthroughput&quot;. AWS network isn't that goo=
d, and is probably saturated. Also have you
 the phi_convict_threshold configured to a high value at least 10 or 12 ? T=
his would avoid nodes to be marked down that often.</div>
</div>
</div>
</div>
</blockquote>
</span></span>
<div>We are using c3.2xlarge which has good network throughput (1GB/sec I t=
hink). We are using default value which is 200MB/sec in 2.0.9. We will play=
 with it in future and see if this could make any difference but as I menti=
oned the data size on each node
 is not huge.</div>
<div>Regarding phi_convict_threshold, our Cassandra is not bringing itself =
down. There was a bug in health check from one of our internal tool and tha=
t tool is recycling the nodes. Nothing to do with Cassandra health. Again w=
e will keep an eye on it in future.</div>
<span>
<div><br>
</div>
<span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">
<div><br>
</div>
<div>What does &quot;nodetool tpstats&quot; outputs ?</div>
</div>
</div>
</div>
</blockquote>
</span></span>
<div>Nodetool tpstats on which node? Any node?</div>
<span>
<div><br>
</div>
<span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">
<div><br>
</div>
<div>Also you might try to monitor resources and see what happens (my guess=
 is you should focus at iowait, disk usage and network, have an eye at cpu =
too).</div>
</div>
</div>
</div>
</blockquote>
</span></span>
<div>We did monitor cpu, disk and network and they are all very low.</div>
<span>
<div><br>
</div>
<span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">
<div><br>
</div>
<div>A quick fix would probably be to hardly throttle the network on all th=
e nodes and see if it helps:</div>
<div><br>
</div>
<div>nodetool setstreamthroughput 2</div>
</div>
</div>
</div>
</blockquote>
</span></span>
<div>We will play with this config. 2.0.9 defaults to 200MB/sec which I thi=
nk is too high.</div>
<span>
<div><br>
</div>
<span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">
<div><br>
</div>
<div>If this work, you could incrementally increase it and monitor, find th=
e good tuning and put it the cassandra.yaml.</div>
<div><br>
</div>
<div>I opened a ticket a while ago about that issue:&nbsp;<a href=3D"https:=
//issues.apache.org/jira/browse/CASSANDRA-9509" target=3D"_blank">https://i=
ssues.apache.org/jira/browse/CASSANDRA-9509</a></div>
</div>
</div>
</div>
</blockquote>
</span></span>
<div>I voted for this issue. Lets see if it gets picked up :).</div>
<div>
<div>
<div><br>
</div>
<span>
<blockquote style=3D"BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0=
 0 5">
<div>
<div>
<div dir=3D"ltr">
<div><br>
</div>
<div>I hope this will help you to go back to a healthy state allowing you a=
 fast upgrade ;-).</div>
<div><br>
</div>
<div>C*heers,</div>
<div>
<div>-----------------------</div>
<div>Alain Rodriguez - <a href=3D"mailto:alain@thelastpickle.com" target=3D=
"_blank">alain@thelastpickle.com</a></div>
<div>France</div>
<div><br>
</div>
<div>The Last Pickle - Apache Cassandra Consulting</div>
<div><a href=3D"http://www.thelastpickle.com" target=3D"_blank">http://www.=
thelastpickle.com</a></div>
</div>
</div>
<div class=3D"gmail_extra"><br>
<div class=3D"gmail_quote">2016-03-02 22:17 GMT&#43;01:00 Peddi, Praveen <s=
pan dir=3D"ltr">
&lt;<a href=3D"mailto:peddi@amazon.com" target=3D"_blank">peddi@amazon.com<=
/a>&gt;</span>:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
<div style=3D"word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-fam=
ily:Calibri,sans-serif">
<div>
<div>Hi Robert,</div>
<div>Thanks for your response.</div>
<div><br>
</div>
<div>Replication factor is 3.</div>
<div><br>
</div>
<div>We are in the process of upgrading to 2.2.4. We have had too many perf=
ormance issues with later versions of Cassandra (I have asked asked for hel=
p related to that in the forum). We are close to getting to similar perform=
ance now and hopefully upgrade in
 next few weeks. Lot of testing to do :(.</div>
<div><br>
</div>
<div>We are not removing multiple nodes at the same time. All dead nodes ar=
e from same AZ so there were no errors when the nodes were down as expected=
 (because we use QUORUM). However, As soon as I started removing nodes one =
by one, every time time we see lot
 of timeout and unavailable exceptions which doesn=92t make any sense becau=
se I am just removing a node that doesn=92t even exist.</div>
<div>
<p class=3D"MsoNormal" style=3D"font-size:11pt;margin:0in 0in 0.0001pt"></p=
>
<table border=3D"1" cellspacing=3D"0" cellpadding=3D"0" style=3D"color:rgb(=
0,0,0);font-size:14px;border-collapse:collapse;border:none">
<tbody>
<tr>
<td width=3D"238" valign=3D"bottom" style=3D"width:3.3in;border:none;paddin=
g:0in 5.4pt">
<br>
</td>
<td width=3D"231" valign=3D"top" style=3D"width:231pt;border:none;padding:0=
in 5.4pt"><br>
</td>
</tr>
<tr>
<td width=3D"469" colspan=3D"2" valign=3D"top" style=3D"width:468.6pt;borde=
r:none;padding:0in 5.4pt">
<br>
</td>
</tr>
<tr>
<td width=3D"469" colspan=3D"2" style=3D"width:468.6pt;border:none;padding:=
0in 5.4pt"><br>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div><br>
</div>
<span>
<div style=3D"font-family:Calibri;font-size:11pt;text-align:left;color:blac=
k;border-width:1pt medium medium;border-style:solid none none;padding:3pt 0=
in 0in;border-top-color:rgb(181,196,223)">
<span style=3D"font-weight:bold">From: </span>Robert Coli &lt;<a href=3D"ma=
ilto:rcoli@eventbrite.com" target=3D"_blank">rcoli@eventbrite.com</a>&gt;<b=
r>
<span style=3D"font-weight:bold">Reply-To: </span>&quot;<a href=3D"mailto:u=
ser@cassandra.apache.org" target=3D"_blank">user@cassandra.apache.org</a>&q=
uot; &lt;<a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">use=
r@cassandra.apache.org</a>&gt;<br>
<span style=3D"font-weight:bold">Date: </span>Wednesday, March 2, 2016 at 2=
:52 PM<br>
<span style=3D"font-weight:bold">To: </span>&quot;<a href=3D"mailto:user@ca=
ssandra.apache.org" target=3D"_blank">user@cassandra.apache.org</a>&quot; &=
lt;<a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">user@cass=
andra.apache.org</a>&gt;<br>
<span style=3D"font-weight:bold">Subject: </span>Re: Removing Node causes b=
unch of HostUnavailableException<br>
</div>
<div>
<div>
<div><br>
</div>
<div>
<div>
<div dir=3D"ltr">
<div class=3D"gmail_extra">
<div class=3D"gmail_quote">On Wed, Mar 2, 2016 at 8:10 AM, Peddi, Praveen <=
span dir=3D"ltr">
&lt;<a href=3D"mailto:peddi@amazon.com" target=3D"_blank">peddi@amazon.com<=
/a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
<div style=3D"word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-fam=
ily:Calibri,sans-serif">
<div>
<div>
<div>We have few dead nodes in the cluster (Amazon ASG removed those thinki=
ng there is an issue with health). Now we are trying to remove those dead n=
odes from the cluster so that other nodes can take over. As soon as I execu=
te nodetool removenode &lt;ID&gt;, we
 see lots of HostUnavailableExceptions both on reads and writes. What I am =
not able to understand is, these are deadnodes and don=92t even physically =
exists. Why would removenode command cause any outage of nodes in Cassandra=
 when we had no errors whatsoever
 before removing them. I could not really find a jira ticket for this.</div=
>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>What is your replication factor?</div>
<div><br>
</div>
<div>Also, 2.0.9 is meaningfully old at this point, consider upgrading ASAP=
.</div>
<div><br>
</div>
<div>Also, removing multiple nodes with removenode means your consistency i=
s pretty hosed. Repair ASAP, but there are potential cases where repair won=
't help.</div>
<div><br>
</div>
<div>=3DRob</div>
<div><br>
</div>
<div><br>
</div>
<div>=3DRob</div>
<div>&nbsp;</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</span></div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</span></div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</span></div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</span></div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</blockquote>
</span>
</body>
</html>

--_000_D30DC3053CFCDpeddiamazoncom_--