cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Bromhead <>
Subject Re: Heterogenous cluster and vnodes
Date Sun, 31 Aug 2014 03:24:03 GMT

> Hey,
> I have a few of VM host (bare metal) machines with varying amounts of free hard drive
space on them. For simplicity let’s say I have three machine like so:
>  * Machine 1
>   - Harddrive 1: 150 GB available.
>  * Machine 2:
>   - Harddrive 1: 150 GB available.
>   - Harddrive 2: 150 GB available.
>  * Machine 3.
>   - Harddrive 1: 150 GB available.
> I am setting up a Cassandra cluster between them and as I see it I have two options:
> 1. I set up one Cassandra node/VM per bare metal machine. I assign all free hard drive
space to each Cassandra node and I balance the cluster using vnodes proportionally to the
amount of free hard drive space (CPU/RAM is not going to be a bottle neck here).
> 2. I set up four VMs, each running a Cassandra node with equal amount of hard drive space
and equal amount of vnodes. Machine 2 runs two VMs.

This setup will potentially create a situation where if Machine 2 goes down you may lose two
replicas. As the two VMs on Machine 2 might be replicas for the same key.

> General question: Is any of these preferable to the other? I understand 1) yields lower
high-availability (since nodes are on the same hardware).

Other way around (2 would be potentially lower availability)… Cassandra thinks two of the
vm's are separate when they in fact rely on the same underlying machine.

> Question about alternative 1: With varying vnodes, can I always be sure that replicas
are never put on the same virtual machine?

Yes… mostly

> Or is varying vnodes really only useful/recommended when migrating from machines with
varying hardware (like mentioned in [1])?

Changing the number of vnodes changes the portion of the ring a node is responsible for. You
can use it to account for different types of hardware, you can also use it for creating awesome
situations like hotspots if you aren't careful… ymmv.

At the end of the day I would throw out the extra hard drive / not use it / put more hard
drives in the other machines. Why? Hard drives are cheap and your time as an admin for the
cluster isn't. If you do add more hard drives you can also split out the commit log etc onto
different disks.

I would take less problems over trying to draw every last scrap of performance out of the
available hardware any day of the year. 

Ben Bromhead
Instaclustr | | @instaclustr | +61 415 936 359

View raw message