cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Watanabe Maki <watanabe.m...@gmail.com>
Subject Re: Simulating a failed node
Date Sun, 28 Oct 2012 04:36:36 GMT
What RF and CL are you using?


On 2012/10/28, at 13:13, Andrew Bialecki <andrew.bialecki@gmail.com> wrote:

> Hey everyone,
> 
> I'm trying to simulate what happens when a node goes down to make sure my cluster can
gracefully handle node failures. For my setup I have a 3 node cluster running 1.1.5. I'm then
using the stress tool included in 1.1.5 coming from an external server and running it with
the following arguments:
> 
> tools/bin/cassandra-stress -d <server1>,<server2>,<server3> -n 1000000
> 
> I start up the stress test and then down one of the nodes. The stress test instantly
fails with the following errors (which of course are the same error from different threads)
looking like:
> 
>           ...
> Operation [158320] retried 10 times - error inserting key 0158320 ((UnavailableException))
> Operation [158429] retried 10 times - error inserting key 0158429 ((UnavailableException))
> Operation [158439] retried 10 times - error inserting key 0158439 ((UnavailableException))
> Operation [158470] retried 10 times - error inserting key 0158470 ((UnavailableException))
> 158534,0,0,NaN,43
> FAILURE
> 
> I'm sure my naive setup is flawed in some way, but what I was hoping for was when the
node went down it would fail to write to the downed node and instead write to one of the other
nodes in the clusters. So question is why are writes failing even after a retry? It might
be the stress client doesn't pool connections (I took a quick look, but might've not looked
deeply enough), however I also tried only specifying the first two server nodes and then downing
the third with the same failure.
> 
> Thanks in advance.
> 
> Andrew

Mime
View raw message