Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of watanabe.maki@gmail.com
 designates 209.85.210.44 as permitted sender)
References: 
 <CAFDWQMRwKqa9QV6KpAfGo1BJHRwhjN0U_8RBmAYx_WejsO249Q@mail.gmail.com>
In-Reply-To: 
 <CAFDWQMRwKqa9QV6KpAfGo1BJHRwhjN0U_8RBmAYx_WejsO249Q@mail.gmail.com>
Mime-Version: 1.0 (1.0)
Content-Transfer-Encoding: 7bit
Content-Type: multipart/alternative;
	boundary=Apple-Mail-F09D8A6F-8A9F-4E10-A8BE-3E3967C09833
Message-Id: <5EDC8322-D23C-46B8-8DAC-2551962ECF24@gmail.com>
Cc: "user@cassandra.apache.org" <user@cassandra.apache.org>
From: Watanabe Maki <watanabe.maki@gmail.com>
Subject: Re: Simulating a failed node
Date: Sun, 28 Oct 2012 13:36:36 +0900
To: "user@cassandra.apache.org" <user@cassandra.apache.org>


--Apple-Mail-F09D8A6F-8A9F-4E10-A8BE-3E3967C09833
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

What RF and CL are you using?


On 2012/10/28, at 13:13, Andrew Bialecki <andrew.bialecki@gmail.com> wrote:

> Hey everyone,
>=20
> I'm trying to simulate what happens when a node goes down to make sure my c=
luster can gracefully handle node failures. For my setup I have a 3 node clu=
ster running 1.1.5. I'm then using the stress tool included in 1.1.5 coming f=
rom an external server and running it with the following arguments:
>=20
> tools/bin/cassandra-stress -d <server1>,<server2>,<server3> -n 1000000
>=20
> I start up the stress test and then down one of the nodes. The stress test=
 instantly fails with the following errors (which of course are the same err=
or from different threads) looking like:
>=20
>           ...
> Operation [158320] retried 10 times - error inserting key 0158320 ((Unavai=
lableException))
> Operation [158429] retried 10 times - error inserting key 0158429 ((Unavai=
lableException))
> Operation [158439] retried 10 times - error inserting key 0158439 ((Unavai=
lableException))
> Operation [158470] retried 10 times - error inserting key 0158470 ((Unavai=
lableException))
> 158534,0,0,NaN,43
> FAILURE
>=20
> I'm sure my naive setup is flawed in some way, but what I was hoping for w=
as when the node went down it would fail to write to the downed node and ins=
tead write to one of the other nodes in the clusters. So question is why are=
 writes failing even after a retry? It might be the stress client doesn't po=
ol connections (I took a quick look, but might've not looked deeply enough),=
 however I also tried only specifying the first two server nodes and then do=
wning the third with the same failure.
>=20
> Thanks in advance.
>=20
> Andrew

--Apple-Mail-F09D8A6F-8A9F-4E10-A8BE-3E3967C09833
Content-Transfer-Encoding: 7bit
Content-Type: text/html;
	charset=utf-8

<html><head></head><body bgcolor="#FFFFFF"><div>What RF and CL are you using?<br><div><br></div></div><div><br>On 2012/10/28, at 13:13, Andrew Bialecki &lt;<a href="mailto:andrew.bialecki@gmail.com">andrew.bialecki@gmail.com</a>&gt; wrote:<br><br></div><div></div><blockquote type="cite"><div>Hey everyone,<div><br></div><div>I'm trying to simulate what happens when a node goes down to make sure my cluster can gracefully handle node failures. For my setup I have a 3 node cluster running 1.1.5. I'm then using the stress tool included in 1.1.5 coming from an external server and running it with the following arguments:<br>

</div><div><br></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div>tools/bin/cassandra-stress -d &lt;server1&gt;,&lt;server2&gt;,&lt;server3&gt;&nbsp;-n 1000000</div></blockquote><div><br></div><div>I start up the stress test and then down one of the nodes. The stress test instantly fails with the following errors (which of course are the same error from different threads) looking like:</div>

<div><br></div><div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ...</div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><div>Operation [158320] retried 10 times - error inserting key 0158320 ((UnavailableException))</div></div><div><div>
Operation [158429] retried 10 times - error inserting key 0158429 ((UnavailableException))</div></div><div><div>Operation [158439] retried 10 times - error inserting key 0158439 ((UnavailableException))</div></div><div><div>
Operation [158470] retried 10 times - error inserting key 0158470 ((UnavailableException))</div></div><div><div>158534,0,0,NaN,43</div></div><div><div>FAILURE</div></div></blockquote><div><br></div><div>I'm sure my naive setup is flawed in some way, but what I was hoping for was when the node went down it would fail to write to the downed node and instead write to one of the other nodes in the clusters. So question is why are writes failing even after a retry? It might be the stress client doesn't pool connections (I took a quick look, but might've not looked deeply enough), however I also tried only specifying the first two server nodes and then downing the third with the same failure.</div>
<div><br></div><div>Thanks in advance.</div><div><br></div><div>Andrew</div>

<img src="https://app.yesware.com/t/a29a064be858aee1c2d0778aa672619b432c4d0f/4b99a3ec4bf1359333ee7d7f279cf7fa/spacer.gif" style="border: 0; width: 1px; height: 1px;"><img src="http://app.yesware.com/t/a29a064be858aee1c2d0778aa672619b432c4d0f/4b99a3ec4bf1359333ee7d7f279cf7fa/spacer.gif" style="border: 0; width: 1px; height: 1px;"><font face="yesware-a29a064be858aee1c2d0778aa672619b432c4d0f-4b99a3ec4bf1359333ee7d7f279cf7fa-to"></font>
</div></blockquote></body></html>
--Apple-Mail-F09D8A6F-8A9F-4E10-A8BE-3E3967C09833--