Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A88AF4253 for ; Wed, 15 Jun 2011 22:39:00 +0000 (UTC) Received: (qmail 12582 invoked by uid 500); 15 Jun 2011 22:38:58 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 12548 invoked by uid 500); 15 Jun 2011 22:38:58 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 12540 invoked by uid 99); 15 Jun 2011 22:38:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jun 2011 22:38:58 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yeosuanaik@gmail.com designates 209.85.214.44 as permitted sender) Received: from [209.85.214.44] (HELO mail-bw0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jun 2011 22:38:52 +0000 Received: by bwz13 with SMTP id 13so995958bwz.31 for ; Wed, 15 Jun 2011 15:38:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=2LHUOS1Xmi86HJbXn3O9lGutOF1uS3ZCE8jcth59RqY=; b=kvuB2/2H82d35uyjlsl9vqeeM9Db0RhM+KqH95Kh1JsYaU7ktot84Fk8V4JS+DEQax XKBQx66bYWpmRU743ldWCa+/v/S+6Ce+U3GolMTUM00uT9cQEnfefnj/cBQuHvtMxECp O/R06L0P9GeCtZZPiR02U7Fyu/dDu9XzLVjI0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=mvFzUvgLuZy6704rQ89lyN2FRsQ7YRgmunFBTdfBehr62MrZelxiZ+psGBE0bbxB/i UWN0h5VhPCPBMDmcvWERvaywbCm+XJD2KGdDgZX5Y2QveZz6X41H9cDGWM0D9stZJrsx tE/2lV0cx1NxCOByLM/cQAiQ3lrNiISF0nfRU= MIME-Version: 1.0 Received: by 10.204.81.18 with SMTP id v18mr105863bkk.167.1308177510749; Wed, 15 Jun 2011 15:38:30 -0700 (PDT) Received: by 10.204.117.145 with HTTP; Wed, 15 Jun 2011 15:38:30 -0700 (PDT) Date: Wed, 15 Jun 2011 17:38:30 -0500 Message-ID: Subject: Easy way to overload a single node on purpose? From: Suan Aik Yeo To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6d97123d4d65e04a5c7cf18 --0016e6d97123d4d65e04a5c7cf18 Content-Type: text/plain; charset=ISO-8859-1 Here's a weird one... what's the best way to get a Cassandra node into a "half-crashed" state? We have a 3-node cluster running 0.7.5. A few days ago this happened organically to node1 - the partition the commitlog was on was 100% full and there was a "No space left on device" error, and after a while, although the cluster and node1 was still up, to the other nodes it was down, and messages like: DEBUG 14:36:55,546 ... timed out started to show up in its debug logs. We have a tool to indicate to the load balancer that a Cassandra node is down, but it didn't detect it that time. Now I'm having trouble purposefully getting the node back to that state, so that I can try other monitoring methods. I've tried to fill up the commitlog partition with other files, and although I get the "No space left on device" error, the node still doesn't go down and show the other symptoms it showed before. Also, if anyone could recommend a good way for a node itself to detect that its in such a state I'd be interested in that too. Currently what we're doing is making a "describe_cluster_name()" thrift call, but that still worked when the node was "down". I'm thinking of something like reading/writing to a fixed value in a keyspace as a check... Unfortunately Java-based solutions are out of the question. Thanks, Suan --0016e6d97123d4d65e04a5c7cf18 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Here's a weird one... what's the best way to get a Cassandra node i= nto a "half-crashed" state?

We have a 3-node c= luster running 0.7.5. A few days ago this happened organically to node1 - t= he partition the commitlog was on was 100% full and there was a "No sp= ace left on device" error, and after a while, although the cluster and= node1 was still up, to the other nodes it was down, and messages like:
=A0 =A0 DEBUG 14:36:55,546 ... timed out
started to show up = in its debug logs.

We have a tool to indicate to t= he load balancer that a Cassandra node is down, but it didn't detect it= that time. Now I'm having trouble purposefully=A0getting=A0the node ba= ck to that state, so that I can try other monitoring methods. I've trie= d to fill up the commitlog partition with other files, and although I get t= he=A0"No space left on device" error, the node still doesn't = go down and show the other symptoms it showed before.

Also, if anyone could recommend a good way for a node i= tself to detect that its in such a state I'd be interested in that too.= Currently what we're doing is making a "describe_cluster_name()&q= uot; thrift call, but that still worked when the node was "down".= I'm thinking of something like reading/writing to a fixed value in a k= eyspace as a check... Unfortunately Java-based solutions are out of the que= stion.


Thanks,
Suan
--0016e6d97123d4d65e04a5c7cf18--