Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from
	:mime-version:content-type:subject:date:in-reply-to:to
	:references:message-id; q=dns; s=thelastpickle.com; b=nk1jzoZidQ
	GBTnHTBoJB40xrH2jY4N3p2j5a/dPoja9bCBXVFn1KQbAQ+bx0D25evBpZhcFXJd
	/LeQesslhfK/9Dq9hz+n0GfEvh8vlONNLWgJY33UvNQXmB29+dPZJj+vZ8z/0xYG
	Sy4UtWgK0M6DKCOkXfPDNzSAU508/TcTs=
From: aaron morton <aaron@thelastpickle.com>
Mime-Version: 1.0 (Apple Message framework v1251.1)
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_61AC350E-6C08-431E-AE15-6FB5600B8D8C"
Subject: Re: ideal cluster size
Date: Mon, 23 Jan 2012 21:55:44 +1300
In-Reply-To: <4F1B55DA.1080105@rightscale.com>
To: user@cassandra.apache.org
References: <4F18F8EE.40608@rightscale.com> <4F1A4070.8030605@bnl.gov>
 <CACjoVs+7UYA46T8+-sDynFwmYZxcKZJtT=cfHwVCao=jxPxn3Q@mail.gmail.com>
 <4F1B39D5.9010001@rightscale.com>
 <CAO5xsd0=TGBe_Cx63koyMbwxgteDfRvVY3maO+7G=PXP=oYtRQ@mail.gmail.com>
 <4F1B55DA.1080105@rightscale.com>
Message-Id: <8CCBD62D-3609-411B-98B4-B8B20FB05691@thelastpickle.com>


--Apple-Mail=_61AC350E-6C08-431E-AE15-6FB5600B8D8C
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

I second Peters point, big servers are not always the best.=20

My experience (using spinning disks) is that 200 to 300 GB of live data =
load per node (including replicated data) is a sweet spot. Above this =
the time taken for compaction, repair, off node backups, node moves etc =
starts to be a pain.=20

Also, suffering catastrophic failure of 1 node in 100 is a better =
situation that 1 node in 16.=20

Finally, when you have more servers with less high performance disks you =
also get more memory and more CPU cores.=20

(I'm obviously ignoring all the ops side here, automate with chef or =
http://www.datastax.com/products/opscenter ).=20

wrt failure modes I wrote this last year, it's about single DC =
deployments but you can probably work it out for multi-dc =
http://thelastpickle.com/2011/06/13/Down-For-Me/

Hope that helps.

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/01/2012, at 1:18 PM, Thorsten von Eicken wrote:

> Good point. One thing I'm wondering about cassandra is what happens =
when
> there is a massive failure. For example, if 1/3 of the nodes go down =
or
> become unreachable. This could happen in EC2 if an AZ has a failure, =
or
> in a datacenter if a whole rack or UPS goes dark. I'm not so concerned
> about the time where the nodes are down. If I understand replication,
> consistency, ring, and such I can architect things such that what must
> continue running does continue.
>=20
> What I'm concerned about is when these nodes all come back up or
> reconnect. I have a hard time figuring out what exactly happens other
> than the fact that hinted handoffs get processed. Are the restarted
> nodes handling reads during that time? If so, they could serve up
> massive amounts of stale data, no? Do they then all start a repair, or
> is this something that needs to be run manually? If many do a repair =
at
> the same time, do I effectively end up with a down cluster due to the
> repair load? If no node was lost, is a repair required or are the =
hinted
> handoffs sufficient?
>=20
> Is there a manual or wiki section that discusses some of this and I =
just
> missed it?
>=20
> On 1/21/2012 2:25 PM, Peter Schuller wrote:
>>> Thanks for the responses! We'll definitely go for powerful servers =
to
>>> reduce the total count. Beyond a dozen servers there really doesn't =
seem
>>> to be much point in trying to increase count anymore for
>> Just be aware that if "big" servers imply *lots* of data (especially
>> in relation to memory size), it's not necessarily the best trade-off.
>> Consider the time it takes to do repairs, streaming, node start-up,
>> etc.
>>=20
>> If it's only about CPU resources then bigger nodes probably make more
>> sense if the h/w is cost effective.
>>=20


--Apple-Mail=_61AC350E-6C08-431E-AE15-6FB5600B8D8C
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">I =
second Peters point, big servers are not always the =
best.&nbsp;<div><br></div><div>My experience (using spinning disks) is =
that 200 to 300 GB of live data load per node (including replicated =
data) is a sweet spot. Above this the time taken for compaction, repair, =
off node backups, node moves etc starts to be a =
pain.&nbsp;</div><div><br></div><div>Also, suffering catastrophic =
failure of 1 node in 100 is a better situation that 1 node in =
16.&nbsp;</div><div><br></div><div>Finally, when you have more servers =
with less high performance disks you also get more memory and more CPU =
cores.&nbsp;</div><div><br></div><div>(I'm obviously ignoring all the =
ops side here, automate with chef or&nbsp;<a =
href=3D"http://www.datastax.com/products/opscenter">http://www.datastax.co=
m/products/opscenter</a>&nbsp;).&nbsp;</div><div><br></div><div>wrt =
failure modes I wrote this last year, it's about single DC deployments =
but you can probably work it out for multi-dc&nbsp;<a =
href=3D"http://thelastpickle.com/2011/06/13/Down-For-Me/">http://thelastpi=
ckle.com/2011/06/13/Down-For-Me/</a></div><div><br></div><div>Hope that =
helps.</div><div><div apple-content-edited=3D"true">
</div>
<br><div apple-content-edited=3D"true">
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></div></span></span>
</div>
<br><div><div>On 22/01/2012, at 1:18 PM, Thorsten von Eicken =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite"><div>Good point. One thing I'm wondering about cassandra =
is what happens when<br>there is a massive failure. For example, if 1/3 =
of the nodes go down or<br>become unreachable. This could happen in EC2 =
if an AZ has a failure, or<br>in a datacenter if a whole rack or UPS =
goes dark. I'm not so concerned<br>about the time where the nodes are =
down. If I understand replication,<br>consistency, ring, and such I can =
architect things such that what must<br>continue running does =
continue.<br><br>What I'm concerned about is when these nodes all come =
back up or<br>reconnect. I have a hard time figuring out what exactly =
happens other<br>than the fact that hinted handoffs get processed. Are =
the restarted<br>nodes handling reads during that time? If so, they =
could serve up<br>massive amounts of stale data, no? Do they then all =
start a repair, or<br>is this something that needs to be run manually? =
If many do a repair at<br>the same time, do I effectively end up with a =
down cluster due to the<br>repair load? If no node was lost, is a repair =
required or are the hinted<br>handoffs sufficient?<br><br>Is there a =
manual or wiki section that discusses some of this and I just<br>missed =
it?<br><br>On 1/21/2012 2:25 PM, Peter Schuller wrote:<br><blockquote =
type=3D"cite"><blockquote type=3D"cite">Thanks for the responses! We'll =
definitely go for powerful servers =
to<br></blockquote></blockquote><blockquote type=3D"cite"><blockquote =
type=3D"cite">reduce the total count. Beyond a dozen servers there =
really doesn't seem<br></blockquote></blockquote><blockquote =
type=3D"cite"><blockquote type=3D"cite">to be much point in trying to =
increase count anymore for<br></blockquote></blockquote><blockquote =
type=3D"cite">Just be aware that if "big" servers imply *lots* of data =
(especially<br></blockquote><blockquote type=3D"cite">in relation to =
memory size), it's not necessarily the best =
trade-off.<br></blockquote><blockquote type=3D"cite">Consider the time =
it takes to do repairs, streaming, node =
start-up,<br></blockquote><blockquote =
type=3D"cite">etc.<br></blockquote><blockquote =
type=3D"cite"><br></blockquote><blockquote type=3D"cite">If it's only =
about CPU resources then bigger nodes probably make =
more<br></blockquote><blockquote type=3D"cite">sense if the h/w is cost =
effective.<br></blockquote><blockquote =
type=3D"cite"><br></blockquote></div></blockquote></div><br></div></body><=
/html>=

--Apple-Mail=_61AC350E-6C08-431E-AE15-6FB5600B8D8C--