Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from
	:mime-version:content-type:subject:date:in-reply-to:to
	:references:message-id; q=dns; s=thelastpickle.com; b=VnW2oc7jXG
	gvM+p7RUYUzuiKtlnGKGYAEqRE+Rz0uhvB8b9FIMamRUFzQvrwOkPySdnGZfpB1j
	P5gDW6Nc0zbwj6uupO/HTR5P43ZWAMk5MqX/Ua+06M69JFbo+HklxrKm708uVoFZ
	evvB+weZLNpYTv/yBre0KzNh6Exoa0odw=
From: aaron morton <aaron@thelastpickle.com>
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: multipart/alternative; boundary=Apple-Mail-7--1014578216
Subject: Re: Knowing when there is a *real* need to add nodes
Date: Fri, 20 May 2011 14:13:39 +1200
In-Reply-To: <BANLkTimMeYuYF5Xu8haRc2G=BR4tp0ExPA@mail.gmail.com>
To: user@cassandra.apache.org
References: <BANLkTim-skGj4PH1tjK7y0XJttM6mfgDWg@mail.gmail.com>
 <3A7C4477-6249-43D6-A5DF-81D4AFA4724B@gmail.com>
 <BANLkTimMeYuYF5Xu8haRc2G=BR4tp0ExPA@mail.gmail.com>
Message-Id: <C64A6BE0-FCD0-4C18-9EC1-C3775C7412D2@thelastpickle.com>


--Apple-Mail-7--1014578216
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252

Considering disk usage is a tricky one. Compacted SSTables files will =
remain on disk until either there is not enough space, or the JVM GC =
runs. To measure the live space use the "Space used (live)" from the =
CFStats. "Space used (total)" includes the space which has been =
compacted and not yet deleted from disk.=20

The data in deleted columns *may* be purged from disk during a minor or =
major compaction. This can happen before GCGraceSeconds has expired. It =
is only the Tombstone that must be kept around for at least =
GCGraceSeconds.=20

I agree that 50% utilisation on the data directories is a sensible soft =
limit that will help keep you out of trouble. The space needed by the =
compaction depends on which bucket of files it is compacting,  but it =
will always require at least as much free disk space as the files it is =
compacting. That should also leave headroom for adding new nodes, just =
in case. Ideally when adding new nodes existing nodes only stream data =
to the new nodes. If however you are increasing the node count by less =
than a factor of 2 you may need to make multiple moves and the nodes may =
need additional space.  =20

To gauge the throughout I would also look at the Latency trackers on the =
o.a.c.db.StorageProxy MBean. They track the latency of complete requests =
including talking to the rest of the cluster. The metrics on the =
individual column families are concerned with the local read.=20

For the pending TP stats I would guess that for the read and write pools =
a pending value consistently higher than the number of threads assigned =
(in the config) would be something to investigate. Waiting on these =
stages will be reflected in the StorageProxy latency numbers.  =
HintedHandoff, StreamStage and AntiEntropyStage will have tasks that =
staying the pending queue for a while. AFAIK the other pools should not =
have many (< 10) tasks in the pending queue and should be able to =
clearing the pending queue. =20

Hope that helps.=20
=20
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 18 May 2011, at 19:50, Tomer B wrote:

> As for static disk usage i would add this:
>=20
> test: df -kh
> description: run test after compaction (check GCGraceSeconds in =
storage-conf.xml) as only then data is expunged permanently, run on data =
disk, assuming here commitlog disk is separated from data dir.
> green gauge: used_space < 30% of disk capacity
> yellow gauge: used space 30% - 50% of disk capacity
> red gauge: used_space > 50% of disk capacity
> comments: Compactions can require up to 100% of in use space =
temporarily in worst case (data file dir) when approaching 50% or more =
of disk capacity use raid0 for data dir disk if cannot try increasing =
your disk if cannot consider adding nodes (or first consider adding =
nodes if that's what you wish).
>=20
> 2011/5/12 Watanabe Maki <watanabe.maki@gmail.com>
> It's interesting topic for me too.
> How about to add measurement on static disk utilization (% used) and =
memory utilization ( rss, JVM heap, JVM GC )?
>=20
> maki
>=20
> =46rom iPhone
>=20
>=20
> On 2011/05/12, at 0:49, Tomer B <tomerbd1@gmail.com> wrote:
> =20
> > Hi
> >
> > I'm trying to predict when my cluster would soon be needing new =
nodes
> > added, i want a continuous graph telling my of my cluster health so
> > that when i see my cluster becomes more and more busy (I want =
numbers
> > & measurments) i would be able to know i need to start purchasing =
more
> > machines and get them into my cluster, so i want to know of that
> > beforehand.
> > I'm writing here what I came with after doing some research over =
net.
> > I would highly appreciate any additional gauge measurements and =
ranges
> > in order to test my cluster health and to know beforehand when i'm
> > going to soon need more nodes.Although i'm writing down green
> > gauge,yellow gauge,red gauge, i'm also trying to find a continuous
> > graph where i can tell where our cluster stand (as much as
> > possible...)
> >
> > Also my recommendation is always before adding new nodes:
> >
> > 1. Make sure all nodes are balanced and if not balance them.
> > 2. Separate commit log drive from data (SSTables) drive
> > 3. use mmap index only in memory and not auto
> > 4. Increase disk IO if possible.
> > 5. Avoid swapping as much as possible.
> >
> >
> > As for my gauge tests for when to add new nodes:
> >
> > test: nodetool tpstats -h <cassandra_host>
> > green gauge: No pending column with number higher
> > yellow gauge: pending columns 100-2000
> > red gauge:Larger than 3000
> >
> > test: iostat -x -n -p -z 5 10  and iostat -xcn 5
> > green gauge: kw/s + kr/s reaches is below 25% capacity of disk io
> > yellow gauge: 20%-50%
> > red gauge: 50%+
> >
> > test: ostat -x -n -p -z 5 10 and check %b column
> > green gauge: less than 10%
> > yellow gauge:  10%-80%
> > red gauge: 90%+
> >
> > test: nodetool cfstats --host localhost
> > green gauge: =93SSTable count=94 item does not continually grow over =
time
> > yellow gauge:
> > red gauge: =93SSTable count=94 item continually grows over time
> >
> > test: ./nodetool cfstats --host localhost | grep -i pending
> > green gauge: 0-2
> > yellow gauge: 3-100
> > red gauge: 101+
> >
> > I would highly appreciate any additional gauge measurements and =
ranges
> > in order to test my cluster health and to know ***beforehand*** when
> > i'm going to soon need more nodes.
>=20


--Apple-Mail-7--1014578216
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=windows-1252

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
">Considering disk usage is a tricky one. Compacted SSTables files will =
remain on disk until either there is not enough space, or the JVM GC =
runs.&nbsp;To measure the live space use the "Space used (live)" from =
the CFStats. "Space used (total)" includes the space which has been =
compacted and not yet deleted from disk.&nbsp;<div><br></div><div>The =
data in deleted columns *may* be purged from disk during a minor or =
major compaction. This can happen before GCGraceSeconds has expired. It =
is only the Tombstone that must be kept around for at least =
GCGraceSeconds.&nbsp;</div><div><br></div><div>I agree that 50% =
utilisation on the data directories is a sensible soft limit that will =
help keep you out of trouble. The space needed by the compaction depends =
on which bucket of files it is compacting, &nbsp;but it will always =
require at least as much free disk space as the files it is =
compacting.&nbsp;That should also leave headroom for adding new nodes, =
just in case. Ideally when adding new nodes existing nodes only stream =
data to the new nodes. If however you are increasing the node count by =
less than a factor of 2 you may need to make multiple moves and the =
nodes may need additional space. =
&nbsp;&nbsp;</div><div><br></div><div>To gauge the throughout I would =
also look at the Latency trackers on the o.a.c.db.StorageProxy MBean. =
They track the latency of complete requests including talking to the =
rest of the cluster. The metrics on the individual column families are =
concerned with the local read.&nbsp;</div><div><br></div><div>For the =
pending TP stats I would guess that for the read and write pools a =
pending value consistently higher than the number of threads assigned =
(in the config) would be something to investigate. Waiting on these =
stages will be reflected in the StorageProxy latency numbers. =
&nbsp;HintedHandoff, StreamStage and AntiEntropyStage will have tasks =
that staying the pending queue for a while. AFAIK the other pools should =
not have many (&lt; 10) tasks in the pending queue and should be able to =
clearing the pending queue. &nbsp;</div><div><br></div><div>Hope that =
helps.&nbsp;</div><div>&nbsp;</div><div><div>
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; =
text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; =
-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: =
0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Cassandra Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></span>
</div>

<br><div><div>On 18 May 2011, at 19:50, Tomer B wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><div =
dir=3D"ltr"><div>As for static disk usage i would add =
this:</div><div><br></div><div><span class=3D"Apple-style-span" =
style=3D"border-collapse: collapse; font-family: arial, sans-serif; =
font-size: 13px; ">test: df -kh</span></div>

<div><span class=3D"Apple-style-span" style=3D"border-collapse: =
collapse; font-family: arial, sans-serif; font-size: 13px; =
">description: run test after compaction (check GCGraceSeconds in =
storage-conf.xml) as only then data is expunged permanently, run on data =
disk, assuming here commitlog disk is separated from data dir.<br>

green gauge: used_space &lt; 30% of disk capacity<br>yellow gauge: used =
space 30% - 50% of disk capacity<br>red gauge: used_space &gt; 50% of =
disk capacity</span></div><div style=3D"direction: ltr;"><font =
class=3D"Apple-style-span" face=3D"arial, sans-serif"><span =
class=3D"Apple-style-span" style=3D"border-collapse: =
collapse;">comments: Compactions can require up to 100% of in use space =
temporarily in worst case (data file dir) when approaching 50% or more =
of disk capacity use raid0 for data dir disk if cannot try increasing =
your disk if cannot consider adding nodes (or first consider adding =
nodes if that's what you wish).</span></font></div>

<div><br><div class=3D"gmail_quote">2011/5/12 Watanabe Maki <span =
dir=3D"ltr">&lt;<a =
href=3D"mailto:watanabe.maki@gmail.com">watanabe.maki@gmail.com</a>&gt;</s=
pan><br><blockquote class=3D"gmail_quote" style=3D"margin-top: 0px; =
margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; =
border-left-width: 1px; border-left-color: rgb(204, 204, 204); =
border-left-style: solid; padding-left: 1ex; position: static; z-index: =
auto; ">

It's interesting topic for me too.<br>
How about to add measurement on static disk utilization (% used) and =
memory utilization ( rss, JVM heap, JVM GC )?<br>
<br>
maki<br>
<br>
=46rom iPhone<br>
<div><div></div><div class=3D"h5"><br>
<br>
On 2011/05/12, at 0:49, Tomer B &lt;<a =
href=3D"mailto:tomerbd1@gmail.com">tomerbd1@gmail.com</a>&gt; wrote:<br>
&nbsp;<br>
&gt; Hi<br>
&gt;<br>
&gt; I'm trying to predict when my cluster would soon be needing new =
nodes<br>
&gt; added, i want a continuous graph telling my of my cluster health =
so<br>
&gt; that when i see my cluster becomes more and more busy (I want =
numbers<br>
&gt; &amp; measurments) i would be able to know i need to start =
purchasing more<br>
&gt; machines and get them into my cluster, so i want to know of =
that<br>
&gt; beforehand.<br>
&gt; I'm writing here what I came with after doing some research over =
net.<br>
&gt; I would highly appreciate any additional gauge measurements and =
ranges<br>
&gt; in order to test my cluster health and to know beforehand when =
i'm<br>
&gt; going to soon need more nodes.Although i'm writing down green<br>
&gt; gauge,yellow gauge,red gauge, i'm also trying to find a =
continuous<br>
&gt; graph where i can tell where our cluster stand (as much as<br>
&gt; possible...)<br>
&gt;<br>
&gt; Also my recommendation is always before adding new nodes:<br>
&gt;<br>
&gt; 1. Make sure all nodes are balanced and if not balance them.<br>
&gt; 2. Separate commit log drive from data (SSTables) drive<br>
&gt; 3. use mmap index only in memory and not auto<br>
&gt; 4. Increase disk IO if possible.<br>
&gt; 5. Avoid swapping as much as possible.<br>
&gt;<br>
&gt;<br>
&gt; As for my gauge tests for when to add new nodes:<br>
&gt;<br>
&gt; test: nodetool tpstats -h &lt;cassandra_host&gt;<br>
&gt; green gauge: No pending column with number higher<br>
&gt; yellow gauge: pending columns 100-2000<br>
&gt; red gauge:Larger than 3000<br>
&gt;<br>
&gt; test: iostat -x -n -p -z 5 10 &nbsp;and iostat -xcn 5<br>
&gt; green gauge: kw/s + kr/s reaches is below 25% capacity of disk =
io<br>
&gt; yellow gauge: 20%-50%<br>
&gt; red gauge: 50%+<br>
&gt;<br>
&gt; test: ostat -x -n -p -z 5 10 and check %b column<br>
&gt; green gauge: less than 10%<br>
&gt; yellow gauge: &nbsp;10%-80%<br>
&gt; red gauge: 90%+<br>
&gt;<br>
&gt; test: nodetool cfstats --host localhost<br>
&gt; green gauge: =93SSTable count=94 item does not continually grow =
over time<br>
&gt; yellow gauge:<br>
&gt; red gauge: =93SSTable count=94 item continually grows over time<br>
&gt;<br>
&gt; test: ./nodetool cfstats --host localhost | grep -i pending<br>
&gt; green gauge: 0-2<br>
&gt; yellow gauge: 3-100<br>
&gt; red gauge: 101+<br>
&gt;<br>
&gt; I would highly appreciate any additional gauge measurements and =
ranges<br>
&gt; in order to test my cluster health and to know ***beforehand*** =
when<br>
&gt; i'm going to soon need more nodes.<br>
</div></div></blockquote></div><br></div></div>
</blockquote></div><br></div></body></html>=

--Apple-Mail-7--1014578216--