cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Bialecki <andrew.biale...@gmail.com>
Subject Re: Simulating a failed node
Date Sun, 28 Oct 2012 04:46:31 GMT
The default replication factor and consistency level for the stress tool is
one, so that's what I'm using. I've also experimented and seen the same
behavior with RF=2, but I haven't tried a different CL.

On Sun, Oct 28, 2012 at 12:36 AM, Watanabe Maki <watanabe.maki@gmail.com>wrote:

> What RF and CL are you using?
>
>
> On 2012/10/28, at 13:13, Andrew Bialecki <andrew.bialecki@gmail.com>
> wrote:
>
> Hey everyone,
>
> I'm trying to simulate what happens when a node goes down to make sure my
> cluster can gracefully handle node failures. For my setup I have a 3 node
> cluster running 1.1.5. I'm then using the stress tool included in 1.1.5
> coming from an external server and running it with the following arguments:
>
> tools/bin/cassandra-stress -d <server1>,<server2>,<server3> -n 1000000
>
>
> I start up the stress test and then down one of the nodes. The stress test
> instantly fails with the following errors (which of course are the same
> error from different threads) looking like:
>
>           ...
>
> Operation [158320] retried 10 times - error inserting key 0158320
> ((UnavailableException))
> Operation [158429] retried 10 times - error inserting key 0158429
> ((UnavailableException))
> Operation [158439] retried 10 times - error inserting key 0158439
> ((UnavailableException))
> Operation [158470] retried 10 times - error inserting key 0158470
> ((UnavailableException))
> 158534,0,0,NaN,43
> FAILURE
>
>
> I'm sure my naive setup is flawed in some way, but what I was hoping for
> was when the node went down it would fail to write to the downed node and
> instead write to one of the other nodes in the clusters. So question is why
> are writes failing even after a retry? It might be the stress client
> doesn't pool connections (I took a quick look, but might've not looked
> deeply enough), however I also tried only specifying the first two server
> nodes and then downing the third with the same failure.
>
> Thanks in advance.
>
> Andrew
>
>

Mime
View raw message