Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of aichrana@gmail.com designates
 209.85.213.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=ApJYu3P6V9YV9w4Ad1BeE+DZ0PzePvmfsLCaTV45Pv/aTdpI97r7VxLkTKXokaBgA9
         HUiERig7HtZdagli5vGWe/qLpnXeCj+Se9wPT9L9rLhGkPXPWB4RSw8ih9LGQPf2G/4Y
         /b9wSoegEcRc+Ykrg2qaJzlVjpOYTCXedbk9Q=
MIME-Version: 1.0
In-Reply-To: <AANLkTi=Za-XGVtgC1_se1TcrM3-=XR3_JF=6MAPmO6h_@mail.gmail.com>
References: <AANLkTik=Je5VPgKgQyU6TtkS_K5FFdZTWesu_n3rMs0W@mail.gmail.com>
	<AANLkTi=Za-XGVtgC1_se1TcrM3-=XR3_JF=6MAPmO6h_@mail.gmail.com>
Date: Mon, 27 Sep 2010 14:55:20 -0700
Message-ID: <AANLkTim0=gFU716cM8sP4Z5rwwj3gTM_o9=HAwF7r3yj@mail.gmail.com>
Subject: Re: UnavailableException when data grows
From: Rana Aich <aichrana@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=00151750eea2e191d4049144c859

--00151750eea2e191d4049144c859
Content-Type: text/plain; charset=ISO-8859-1

Hi Peter,

Thanks for your detailed query...

I have 8 m/c cluster. KVSHIGH1,2,3,4 and KVSLOW1,2,3,4. As the name suggests
KVSLOWs have low diskspace ~ 350GB
 Whereas KVSHIGHs have 1.5 terabytes.

Yet my nodetool shows the following:
192.168.202.202Down       319.94 GB
7200044730783885730400843868815072654      |<--|
192.168.202.4 Up         382.39 GB
23719654286404067863958492664769598669     |   ^
192.168.202.2 Up         106.81 GB
36701505058375526444137310055285336988     v   |
192.168.202.3 Up         149.81 GB
65098486053779167479528707238121707074     |   ^
192.168.202.201Up         154.72 GB
79420606800360567885560534277526521273     v   |
192.168.202.204Up         72.91 GB
 85219217446418416293334453572116009608     |   ^
192.168.202.1 Up         29.78 GB
 87632302962564279114105239858760976120     v   |
192.168.202.203Up         9.35 GB
87790520647700936489181912967436646309     |-->|

As you can see one of our KVSLOW box is already down. Its 100% full. Whereas
boxes having 1.5 terabytes have only 29.78 GB (192.168.202.1 )! I'm using
RandomPartitioner. When I run the client program the Cassandra Daemon takes
around 85-130% CPU.

Regards,

Rana


On Mon, Sep 27, 2010 at 2:31 PM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > How can I handle this kind of situation?
>
> In terms of surviving the problem, a re-try on the client side might
> help assuming the problem is temporary.
>
> However,  certainly the fact that you're seeing an issue to begin with
> is interesting, and the way to avoid it would depend on what the
> problem is. My understanding is that the UnavailableException
> indicates that the node you are talking to was unable to read
> form/write to a sufficient number of nodes to satisfy your consistency
> level. Presumably either because individual requests failed to return
> in time, or because the node considers other nodes to be flat out
> down.
>
> Can you correlate these issues with server-side activity on the nodes,
> such as background compaction, commitlog rotation or memtable
> flushing? Do you see your nodes saying that other nodes in the cluster
> are "DOWN" and "UP" (flapping)?
>
> How large is the data set in total (in terms of sstable size on disk),
> and how much memory do you have in your machines (going to page
> cache)?
>
> Have you observed the behavior of your nodes during compaction; in
> particular whether compaction is CPU bound or I/O bound? (That would
> tend to depend on data; generally the larger the individual values the
> more disk bound you'd tend to be.)
>
> Just trying to zero in on what the likely root cause is in this case.
>
> --
> / Peter Schuller
>

--00151750eea2e191d4049144c859
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div>Hi Peter,</div><div><br></div>Thanks for your detailed query...<div><b=
r></div><div>I have 8 m/c cluster. KVSHIGH1,2,3,4 and KVSLOW1,2,3,4. As the=
 name suggests KVSLOWs have low diskspace ~ 350GB</div><div>=A0Whereas KVSH=
IGHs have 1.5 terabytes.</div>
<div><br></div><div>Yet my nodetool shows the following:</div><div><div>192=
.168.202.202Down =A0 =A0 =A0 319.94 GB =A0 =A0 7200044730783885730400843868=
815072654 =A0 =A0 =A0|&lt;--|</div><div>192.168.202.4 Up =A0 =A0 =A0 =A0 38=
2.39 GB =A0 =A0 23719654286404067863958492664769598669 =A0 =A0 | =A0 ^</div=
>
<div>192.168.202.2 Up =A0 =A0 =A0 =A0 106.81 GB =A0 =A0 3670150505837552644=
4137310055285336988 =A0 =A0 v =A0 |</div><div>192.168.202.3 Up =A0 =A0 =A0 =
=A0 149.81 GB =A0 =A0 65098486053779167479528707238121707074 =A0 =A0 | =A0 =
^</div><div>192.168.202.201Up =A0 =A0 =A0 =A0 154.72 GB =A0 =A0 79420606800=
360567885560534277526521273 =A0 =A0 v =A0 |</div>
<div>192.168.202.204Up =A0 =A0 =A0 =A0 72.91 GB =A0 =A0 =A08521921744641841=
6293334453572116009608 =A0 =A0 | =A0 ^</div><div>192.168.202.1 Up =A0 =A0 =
=A0 =A0 29.78 GB =A0 =A0 =A087632302962564279114105239858760976120 =A0 =A0 =
v =A0 |</div><div>192.168.202.203Up =A0 =A0 =A0 =A0 9.35 GB =A0 =A0 =A0 877=
90520647700936489181912967436646309 =A0 =A0 |--&gt;|</div>
</div><div><br></div><div>As you can see one of our KVSLOW box is already d=
own. Its 100% full. Whereas boxes having 1.5 terabytes have only 29.78 GB (=
192.168.202.1 )! I&#39;m using RandomPartitioner. When I run the client pro=
gram the Cassandra Daemon takes around 85-130% CPU.=A0</div>
<div><br></div><div>Regards,</div><div><br></div><div>Rana</div><div><br></=
div><div><br></div><div><br><div class=3D"gmail_quote">On Mon, Sep 27, 2010=
 at 2:31 PM, Peter Schuller <span dir=3D"ltr">&lt;<a href=3D"mailto:peter.s=
chuller@infidyne.com">peter.schuller@infidyne.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;"><div class=3D"im">&gt; How can I handle thi=
s kind of situation?<br>
<br>
</div>In terms of surviving the problem, a re-try on the client side might<=
br>
help assuming the problem is temporary.<br>
<br>
However, =A0certainly the fact that you&#39;re seeing an issue to begin wit=
h<br>
is interesting, and the way to avoid it would depend on what the<br>
problem is. My understanding is that the UnavailableException<br>
indicates that the node you are talking to was unable to read<br>
form/write to a sufficient number of nodes to satisfy your consistency<br>
level. Presumably either because individual requests failed to return<br>
in time, or because the node considers other nodes to be flat out<br>
down.<br>
<br>
Can you correlate these issues with server-side activity on the nodes,<br>
such as background compaction, commitlog rotation or memtable<br>
flushing? Do you see your nodes saying that other nodes in the cluster<br>
are &quot;DOWN&quot; and &quot;UP&quot; (flapping)?<br>
<br>
How large is the data set in total (in terms of sstable size on disk),<br>
and how much memory do you have in your machines (going to page<br>
cache)?<br>
<br>
Have you observed the behavior of your nodes during compaction; in<br>
particular whether compaction is CPU bound or I/O bound? (That would<br>
tend to depend on data; generally the larger the individual values the<br>
more disk bound you&#39;d tend to be.)<br>
<br>
Just trying to zero in on what the likely root cause is in this case.<br>
<br>
--<br>
<font color=3D"#888888">/ Peter Schuller<br>
</font></blockquote></div><br></div>

--00151750eea2e191d4049144c859--