Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: softfail (nike.apache.org: transitioning domain of
 hfyuan@rhapsody.com does not designate 209.85.216.179 as permitted sender)
From: Hefeng Yuan <hfyuan@rhapsody.com>
Mime-Version: 1.0 (Apple Message framework v1244.3)
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_80BDB883-AAE0-4C67-9D36-464A4B29AC59"
Subject: Re: Calculate number of nodes required based on data
Date: Wed, 7 Sep 2011 10:09:42 -0700
In-Reply-To: 
 <CAGJGVOWXK4iHq67J4JgOx-f_AdxusfOVD=afNodexP6=uzBuow@mail.gmail.com>
To: user@cassandra.apache.org
References: <18A410A9-951F-4002-B188-DFA5C421D919@rhapsody.com>
 <CAGJGVOWXK4iHq67J4JgOx-f_AdxusfOVD=afNodexP6=uzBuow@mail.gmail.com>
Message-Id: <51889EA5-9480-4565-88EE-2F9BD7B1ED36@rhapsody.com>


--Apple-Mail=_80BDB883-AAE0-4C67-9D36-464A4B29AC59
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=iso-8859-1

Adi,

The reason we're attempting to add more nodes is trying to solve the =
long/simultaneous compactions, i.e. the performance issue, not the =
storage issue yet.
We have RF 5 and CL QUORUM for read and write, we have currently 6 =
nodes, and when 4 nodes doing compaction at the same period, we're =
screwed, especially on read, since it'll cover one of the compaction =
node anyways.=20
My assumption is that if we add more nodes, each node will have less =
load, and therefore need less compaction, and probably will compact =
faster, eternally avoid 4+ nodes doing compaction simultaneously.

Any suggestion on how to calculate how many more nodes to add? Or, =
generally how to plan for number of nodes required, from a performance =
perspective?

Thanks,
Hefeng

On Sep 7, 2011, at 9:56 AM, Adi wrote:

> On Tue, Sep 6, 2011 at 3:53 PM, Hefeng Yuan <hfyuan@rhapsody.com> =
wrote:
> Hi,
>=20
> Is there any suggested way of calculating number of nodes needed based =
on data?
> =20
> We currently have 6 nodes (each has 8G memory) with RF5 (because we =
want to be able to survive loss of 2 nodes).
> The flush of memtable happens around every 30 min (while not doing =
compaction), with ~9m serialized bytes.
>=20
> The problem is that we see more than 3 nodes doing compaction at the =
same time, which slows down the application.
> (tried to increase/decrease compaction_throughput_mb_per_sec, not =
helping much)
>=20
> So I'm thinking probably we should add more nodes, but not sure how =
many more to add.
> Based on the data rate, is there any suggested way of calculating =
number of nodes required?
>=20
> Thanks,
> Hefeng
>=20
>=20
> What is the total  amount of data?
> What is the total amount in the biggest column family?
>=20
> There is no hard limit per node. Cassandra gurus like more nodes :-). =
One number for 'happy cassandra users'  I have seen mentioned in =
discussions is around 250-300 GB per node. But you could store more per =
node by having multiple column families each storing around 250-300 GB =
per column family. The main problem being repair/compactions and such =
operations taking longer and requiring much more spare disk space.
>=20
> As for slow down in application during compaction I was wondering=20
> what is the CL you are using for read and writes?
> Make sure it is not a client issue - Is your client hitting all nodes =
in round-robin or some other fashion?
>=20
> -Adi


--Apple-Mail=_80BDB883-AAE0-4C67-9D36-464A4B29AC59
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=iso-8859-1

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
">Adi,<div><br></div><div>The reason we're attempting to add more nodes =
is trying to solve the long/simultaneous compactions, i.e. the =
performance issue, not the storage issue yet.</div><div>We have RF 5 and =
CL QUORUM for read and write, we have currently 6 nodes, and when 4 =
nodes doing compaction at the same period, we're screwed, especially on =
read, since it'll cover one of the compaction node =
anyways.&nbsp;</div><div>My assumption is that if we add more nodes, =
each node will have less load, and therefore need less compaction, and =
probably will compact faster, eternally avoid 4+ nodes doing compaction =
simultaneously.</div><div><br></div><div>Any suggestion on how to =
calculate how many more nodes to add? Or, generally how to plan for =
number of nodes required, from a performance =
perspective?</div><div><br></div><div>Thanks,</div><div>Hefeng</div><div><=
br><div><div>On Sep 7, 2011, at 9:56 AM, Adi wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><div =
class=3D"gmail_quote">On Tue, Sep 6, 2011 at 3:53 PM, Hefeng Yuan <span =
dir=3D"ltr">&lt;<a =
href=3D"mailto:hfyuan@rhapsody.com">hfyuan@rhapsody.com</a>&gt;</span> =
wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hi,<br>
<br>
Is there any suggested way of calculating number of nodes needed based =
on data?<br>&nbsp;</blockquote><blockquote class=3D"gmail_quote" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex;">
We currently have 6 nodes (each has 8G memory) with RF5 (because we want =
to be able to survive loss of 2 nodes).<br>
The flush of memtable happens around every 30 min (while not doing =
compaction), with ~9m serialized bytes.<br>
<br>
The problem is that we see more than 3 nodes doing compaction at the =
same time, which slows down the application.<br>
(tried to increase/decrease compaction_throughput_mb_per_sec, not =
helping much)<br><br></blockquote><blockquote class=3D"gmail_quote" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex;">
So I'm thinking probably we should add more nodes, but not sure how many =
more to add.<br>
Based on the data rate, is there any suggested way of calculating number =
of nodes required?<br>
<br>
Thanks,<br>
<font =
color=3D"#888888">Hefeng</font></blockquote></div><br><div><br></div><div>=
<div>What is the total &nbsp;amount of data?</div><div>What is the total =
amount in the biggest column family?</div><div><br></div><div>There is =
no hard limit per node. Cassandra gurus like more nodes :-). One number =
for 'happy cassandra users' &nbsp;I have seen mentioned in discussions =
is around 250-300 GB per node. But you could store more per node by =
having multiple column families each storing around 250-300 GB per =
column family. The main problem being repair/compactions and such =
operations taking longer and requiring much more spare disk space.</div>
<div><br></div></div><div>As for slow down in application during =
compaction I was wondering&nbsp;</div><div>what is the CL you are using =
for read and writes?</div><div>Make sure it is not a client issue - Is =
your client hitting all nodes in round-robin or some other =
fashion?</div>
<div><br></div><div>-Adi</div>
</blockquote></div><br></div></body></html>=

--Apple-Mail=_80BDB883-AAE0-4C67-9D36-464A4B29AC59--