Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of ben@instaclustr.com
 designates 209.85.192.177 as permitted sender)
From: Ben Bromhead <ben@instaclustr.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_24E957C9-A910-420A-930F-F33EF7F4D2BA"
Message-Id: <ADEB58E0-3519-426C-A186-221DAD54AC37@instaclustr.com>
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Re: Heterogenous cluster and vnodes
Date: Sun, 31 Aug 2014 13:24:03 +1000
References: <1409321378972.a1de4563@Nodemailer>
To: user@cassandra.apache.org
In-Reply-To: <1409321378972.a1de4563@Nodemailer>


--Apple-Mail=_24E957C9-A910-420A-930F-F33EF7F4D2BA
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252


> Hey,
>=20
> I have a few of VM host (bare metal) machines with varying amounts of =
free hard drive space on them. For simplicity let=92s say I have three =
machine like so:
>  * Machine 1
>   - Harddrive 1: 150 GB available.
>  * Machine 2:
>   - Harddrive 1: 150 GB available.
>   - Harddrive 2: 150 GB available.
>  * Machine 3.
>   - Harddrive 1: 150 GB available.
>=20
> I am setting up a Cassandra cluster between them and as I see it I =
have two options:
>=20
> 1. I set up one Cassandra node/VM per bare metal machine. I assign all =
free hard drive space to each Cassandra node and I balance the cluster =
using vnodes proportionally to the amount of free hard drive space =
(CPU/RAM is not going to be a bottle neck here).
>=20
> 2. I set up four VMs, each running a Cassandra node with equal amount =
of hard drive space and equal amount of vnodes. Machine 2 runs two VMs.

This setup will potentially create a situation where if Machine 2 goes =
down you may lose two replicas. As the two VMs on Machine 2 might be =
replicas for the same key.

>=20
> General question: Is any of these preferable to the other? I =
understand 1) yields lower high-availability (since nodes are on the =
same hardware).

Other way around (2 would be potentially lower availability)=85 =
Cassandra thinks two of the vm's are separate when they in fact rely on =
the same underlying machine.

>=20
> Question about alternative 1: With varying vnodes, can I always be =
sure that replicas are never put on the same virtual machine?

Yes=85 mostly https://issues.apache.org/jira/browse/CASSANDRA-4123

> Or is varying vnodes really only useful/recommended when migrating =
from machines with varying hardware (like mentioned in [1])?

Changing the number of vnodes changes the portion of the ring a node is =
responsible for. You can use it to account for different types of =
hardware, you can also use it for creating awesome situations like =
hotspots if you aren't careful=85 ymmv.

At the end of the day I would throw out the extra hard drive / not use =
it / put more hard drives in the other machines. Why? Hard drives are =
cheap and your time as an admin for the cluster isn't. If you do add =
more hard drives you can also split out the commit log etc onto =
different disks.

I would take less problems over trying to draw every last scrap of =
performance out of the available hardware any day of the year.=20


Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359


--Apple-Mail=_24E957C9-A910-420A-930F-F33EF7F4D2BA
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=windows-1252

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dwindows-1252"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div =
apple-content-edited=3D"true"><br></div><div><blockquote type=3D"cite">
<div>
<span id=3D"mailbox-conversation"><div>Hey,</div>
<div><br></div>
<div>I have a few of VM host (bare metal) machines with varying amounts =
of free hard drive space on them. For simplicity let=92s say I have =
three machine like so:</div>
<div>&nbsp;* Machine 1</div>
<div>&nbsp; - Harddrive 1: 150 GB available.</div>
<div>&nbsp;* Machine 2:</div>
<div>&nbsp; - Harddrive 1: 150 GB available.</div>
<div>&nbsp; - Harddrive 2: 150 GB available.</div>
<div>&nbsp;* Machine 3.</div>
<div>&nbsp; - Harddrive 1: 150 GB available.<div =
id=3D"mb-reply"><br></div>
<div id=3D"mb-reply">I am setting up a Cassandra cluster between them =
and as I see it I have two options:</div>
<div id=3D"mb-reply"><br></div>
<div id=3D"mb-reply">1. I set up one Cassandra node/VM per bare metal =
machine. I assign all free hard drive space to each Cassandra node and I =
balance the cluster using vnodes proportionally to the amount of free =
hard drive space (CPU/RAM is not going to be a bottle neck here).</div>
<div id=3D"mb-reply"><br></div>
<div id=3D"mb-reply">2. I set up four VMs, each running a Cassandra node =
with equal amount of hard drive space and equal amount of vnodes. =
Machine 2 runs two =
VMs.</div></div></span></div></blockquote><div><br></div><div>This setup =
will potentially create a situation where if Machine 2 goes down you may =
lose two replicas. As the two VMs on Machine 2 might be replicas for the =
same key.</div><br><blockquote type=3D"cite"><div><span =
id=3D"mailbox-conversation"><div>
</div>
<div><br></div>
<div>General question: Is any of these preferable to the other? I =
understand 1) yields lower high-availability (since nodes are on the =
same hardware).</div></span></div></blockquote><div><br></div><div>Other =
way around (2 would be potentially lower availability)=85 Cassandra =
thinks two of the vm's are separate when they in fact rely on the same =
underlying machine.</div><div><br></div><blockquote =
type=3D"cite"><div><span id=3D"mailbox-conversation">
<div><br></div>
<div>Question about alternative 1: With varying vnodes, can I always be =
sure that replicas are never put on the same virtual machine? =
</div></span></div></blockquote><div><br></div><div>Yes=85 =
mostly&nbsp;<a =
href=3D"https://issues.apache.org/jira/browse/CASSANDRA-4123">https://issu=
es.apache.org/jira/browse/CASSANDRA-4123</a></div><br><blockquote =
type=3D"cite"><div><span id=3D"mailbox-conversation"><div>Or is varying =
vnodes really only useful/recommended when migrating from machines with =
varying hardware (like mentioned in =
[1])?</div></span></div></blockquote><div><br></div>Changing the number =
of vnodes changes the portion of the ring a node is responsible for. You =
can use it to account for different types of hardware, you can also use =
it for creating awesome situations like hotspots if you aren't careful=85 =
ymmv.</div><div><br></div><div>At the end of the day I would throw out =
the extra hard drive / not use it / put more hard drives in the other =
machines. Why? Hard drives are cheap and your time as an admin for the =
cluster isn't. If you do add more hard drives you can also split out the =
commit log etc onto different disks.</div><div><br></div><div>I would =
take less problems over trying to draw every last scrap of performance =
out of the available hardware any day of the =
year.&nbsp;</div><div><div><br></div></div><br><div><div><div><div>Ben =
Bromhead</div><div></div></div><div>Instaclustr |&nbsp;<a =
href=3D"https://www.instaclustr.com/">www.instaclustr.com</a>&nbsp;|&nbsp;=
<a href=3D"http://twitter.com/instaclustr">@instaclustr</a>&nbsp;| +61 =
415 936 359</div></div><div><br></div></div></body></html>=

--Apple-Mail=_24E957C9-A910-420A-930F-F33EF7F4D2BA--