Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E452ED2A2 for ; Sun, 28 Oct 2012 04:37:45 +0000 (UTC) Received: (qmail 49580 invoked by uid 500); 28 Oct 2012 04:37:43 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 49317 invoked by uid 500); 28 Oct 2012 04:37:42 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 49289 invoked by uid 99); 28 Oct 2012 04:37:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Oct 2012 04:37:41 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=HTML_FONT_FACE_BAD,HTML_IMAGE_ONLY_28,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_REMOTE_IMAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of watanabe.maki@gmail.com designates 209.85.210.44 as permitted sender) Received: from [209.85.210.44] (HELO mail-da0-f44.google.com) (209.85.210.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Oct 2012 04:37:34 +0000 Received: by mail-da0-f44.google.com with SMTP id h15so1846367dan.31 for ; Sat, 27 Oct 2012 21:37:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:in-reply-to:mime-version:content-transfer-encoding :content-type:message-id:cc:x-mailer:from:subject:date:to; bh=GwQfqj8ET0sD4Lha8mBHG/lgpOuYgyjIoNDBAFrZTeU=; b=HSJTapUz3V7Xu51IwfNHOg5QUyGwSbphNbfKwNqK9KoIa7FTcBzmv81k1edP0zqJun MjcHE2JCDaCI1r0O7HLeMkZgX14MKe9FowFHTJIAxqz5Ylj1aquwWq/z8M4Yyl/y+WzP 1ZsUTJtheAFhUKxzAtGLtf/yhN9xSv8DT2cYumB3Z458tMpAQbkie75H6hi4NAthZvB6 ytuyfSLx0G1lmGPKKHNkDh3SjX9Y3jIDHD4z5gEjdb8gulJECp3iPmWio0cxm+9fP8dt +IYFnGQMzDQmrkL9ekjBF2do1yC8OS62Uywg46RDceefcntgVrekGtQUPlC2csI0EUgA 6dHQ== Received: by 10.68.195.195 with SMTP id ig3mr83543317pbc.108.1351399034292; Sat, 27 Oct 2012 21:37:14 -0700 (PDT) Received: from [10.84.136.24] (pw126210002032.5.kyb.panda-world.ne.jp. [126.210.2.32]) by mx.google.com with ESMTPS id pf4sm3768109pbc.38.2012.10.27.21.36.41 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 27 Oct 2012 21:37:13 -0700 (PDT) References: In-Reply-To: Mime-Version: 1.0 (1.0) Content-Transfer-Encoding: 7bit Content-Type: multipart/alternative; boundary=Apple-Mail-F09D8A6F-8A9F-4E10-A8BE-3E3967C09833 Message-Id: <5EDC8322-D23C-46B8-8DAC-2551962ECF24@gmail.com> Cc: "user@cassandra.apache.org" X-Mailer: iPhone Mail (9B206) From: Watanabe Maki Subject: Re: Simulating a failed node Date: Sun, 28 Oct 2012 13:36:36 +0900 To: "user@cassandra.apache.org" X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-F09D8A6F-8A9F-4E10-A8BE-3E3967C09833 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii What RF and CL are you using? On 2012/10/28, at 13:13, Andrew Bialecki wrote: > Hey everyone, >=20 > I'm trying to simulate what happens when a node goes down to make sure my c= luster can gracefully handle node failures. For my setup I have a 3 node clu= ster running 1.1.5. I'm then using the stress tool included in 1.1.5 coming f= rom an external server and running it with the following arguments: >=20 > tools/bin/cassandra-stress -d ,, -n 1000000 >=20 > I start up the stress test and then down one of the nodes. The stress test= instantly fails with the following errors (which of course are the same err= or from different threads) looking like: >=20 > ... > Operation [158320] retried 10 times - error inserting key 0158320 ((Unavai= lableException)) > Operation [158429] retried 10 times - error inserting key 0158429 ((Unavai= lableException)) > Operation [158439] retried 10 times - error inserting key 0158439 ((Unavai= lableException)) > Operation [158470] retried 10 times - error inserting key 0158470 ((Unavai= lableException)) > 158534,0,0,NaN,43 > FAILURE >=20 > I'm sure my naive setup is flawed in some way, but what I was hoping for w= as when the node went down it would fail to write to the downed node and ins= tead write to one of the other nodes in the clusters. So question is why are= writes failing even after a retry? It might be the stress client doesn't po= ol connections (I took a quick look, but might've not looked deeply enough),= however I also tried only specifying the first two server nodes and then do= wning the third with the same failure. >=20 > Thanks in advance. >=20 > Andrew --Apple-Mail-F09D8A6F-8A9F-4E10-A8BE-3E3967C09833 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=utf-8
What RF and CL are you using?


On 2012/10/28, at 13:13, Andrew Bialecki <andrew.bialecki@gmail.com> wrote:

Hey everyone,

I'm trying to simulate what happens when a node goes down to make sure my cluster can gracefully handle node failures. For my setup I have a 3 node cluster running 1.1.5. I'm then using the stress tool included in 1.1.5 coming from an external server and running it with the following arguments:

tools/bin/cassandra-stress -d <server1>,<server2>,<server3> -n 1000000

I start up the stress test and then down one of the nodes. The stress test instantly fails with the following errors (which of course are the same error from different threads) looking like:

          ...
Operation [158320] retried 10 times - error inserting key 0158320 ((UnavailableException))
Operation [158429] retried 10 times - error inserting key 0158429 ((UnavailableException))
Operation [158439] retried 10 times - error inserting key 0158439 ((UnavailableException))
Operation [158470] retried 10 times - error inserting key 0158470 ((UnavailableException))
158534,0,0,NaN,43
FAILURE

I'm sure my naive setup is flawed in some way, but what I was hoping for was when the node went down it would fail to write to the downed node and instead write to one of the other nodes in the clusters. So question is why are writes failing even after a retry? It might be the stress client doesn't pool connections (I took a quick look, but might've not looked deeply enough), however I also tried only specifying the first two server nodes and then downing the third with the same failure.

Thanks in advance.

Andrew
--Apple-Mail-F09D8A6F-8A9F-4E10-A8BE-3E3967C09833--