Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of yeosuanaik@gmail.com
 designates 209.85.214.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:date:message-id:subject:from:to:content-type;
        b=mvFzUvgLuZy6704rQ89lyN2FRsQ7YRgmunFBTdfBehr62MrZelxiZ+psGBE0bbxB/i
         UWN0h5VhPCPBMDmcvWERvaywbCm+XJD2KGdDgZX5Y2QveZz6X41H9cDGWM0D9stZJrsx
         tE/2lV0cx1NxCOByLM/cQAiQ3lrNiISF0nfRU=
MIME-Version: 1.0
Date: Wed, 15 Jun 2011 17:38:30 -0500
Message-ID: <BANLkTi=u5bsDNr_cigJzeFHvPOafbgyo_Q@mail.gmail.com>
Subject: Easy way to overload a single node on purpose?
From: Suan Aik Yeo <yeosuanaik@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0016e6d97123d4d65e04a5c7cf18

--0016e6d97123d4d65e04a5c7cf18
Content-Type: text/plain; charset=ISO-8859-1

Here's a weird one... what's the best way to get a Cassandra node into a
"half-crashed" state?

We have a 3-node cluster running 0.7.5. A few days ago this happened
organically to node1 - the partition the commitlog was on was 100% full and
there was a "No space left on device" error, and after a while, although the
cluster and node1 was still up, to the other nodes it was down, and messages
like:
    DEBUG 14:36:55,546 ... timed out
started to show up in its debug logs.

We have a tool to indicate to the load balancer that a Cassandra node is
down, but it didn't detect it that time. Now I'm having trouble
purposefully getting the node back to that state, so that I can try other
monitoring methods. I've tried to fill up the commitlog partition with other
files, and although I get the "No space left on device" error, the node
still doesn't go down and show the other symptoms it showed before.

Also, if anyone could recommend a good way for a node itself to detect that
its in such a state I'd be interested in that too. Currently what we're
doing is making a "describe_cluster_name()" thrift call, but that still
worked when the node was "down". I'm thinking of something like
reading/writing to a fixed value in a keyspace as a check... Unfortunately
Java-based solutions are out of the question.


Thanks,
Suan

--0016e6d97123d4d65e04a5c7cf18
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Here&#39;s a weird one... what&#39;s the best way to get a Cassandra node i=
nto a &quot;half-crashed&quot; state?<div><br></div><div>We have a 3-node c=
luster running 0.7.5. A few days ago this happened organically to node1 - t=
he partition the commitlog was on was 100% full and there was a &quot;No sp=
ace left on device&quot; error, and after a while, although the cluster and=
 node1 was still up, to the other nodes it was down, and messages like:</di=
v>
<div>=A0 =A0 DEBUG 14:36:55,546 ... timed out</div><div>started to show up =
in its debug logs.</div><div><br></div><div>We have a tool to indicate to t=
he load balancer that a Cassandra node is down, but it didn&#39;t detect it=
 that time. Now I&#39;m having trouble purposefully=A0getting=A0the node ba=
ck to that state, so that I can try other monitoring methods. I&#39;ve trie=
d to fill up the commitlog partition with other files, and although I get t=
he=A0&quot;No space left on device&quot; error, the node still doesn&#39;t =
go down and show the other symptoms it showed before.</div>
<div><br></div><div>Also, if anyone could recommend a good way for a node i=
tself to detect that its in such a state I&#39;d be interested in that too.=
 Currently what we&#39;re doing is making a &quot;describe_cluster_name()&q=
uot; thrift call, but that still worked when the node was &quot;down&quot;.=
 I&#39;m thinking of something like reading/writing to a fixed value in a k=
eyspace as a check... Unfortunately Java-based solutions are out of the que=
stion.</div>
<div><br></div><div><br></div><div>Thanks,</div><div>Suan</div><meta http-e=
quiv=3D"content-type" content=3D"text/html; charset=3Dutf-8"><meta http-equ=
iv=3D"content-type" content=3D"text/html; charset=3Dutf-8">

--0016e6d97123d4d65e04a5c7cf18--