Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B7E9E11912 for ; Sun, 31 Aug 2014 03:24:35 +0000 (UTC) Received: (qmail 97710 invoked by uid 500); 31 Aug 2014 03:24:31 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 97663 invoked by uid 500); 31 Aug 2014 03:24:31 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 97653 invoked by uid 99); 31 Aug 2014 03:24:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 31 Aug 2014 03:24:31 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ben@instaclustr.com designates 209.85.192.177 as permitted sender) Received: from [209.85.192.177] (HELO mail-pd0-f177.google.com) (209.85.192.177) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 31 Aug 2014 03:24:26 +0000 Received: by mail-pd0-f177.google.com with SMTP id r10so3280255pdi.8 for ; Sat, 30 Aug 2014 20:24:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:message-id:mime-version :subject:date:references:to:in-reply-to; bh=/ZaRTrEB9CG9c2Q+e4LQJ8C9CKAn6TPvKA6AUrq4+vI=; b=IBwEJIC6d8abtGV0y2YC4ua24W7TCEZ85pwTU5/rd/1rKXV2gXdyU+pXdmZUUss9xE ynxvSMKunubQ54TacweBKOPktWO65zKn8IFbCIz8hyS3YChvII5kGOs5H7W+bqK+9lQI 6kWREos+7JCTOfY2Z5hE+R3Unn4dyLFp9Yi1bW+kGJXJVly24KjpWK1GNtJT1uS9N3ul MNXhLUEq640/JNvxRYZ7Op+jJIddmoKFU8zOC6IER7/2xZc7/bBHP37W/RDYgwEtJtW8 LzxOUpCgZvkGU4KY5Hx2ahl0awJ1K1xGCL7yn1+pKZfiTJfi7qDsoaZlBmuOFk2PaTKC 60nQ== X-Gm-Message-State: ALoCoQnwt25Zds9T602WgRD4G4MiUIFQCiXuDzRIGCT+5af4x7AFVrW3WUnkDq7l9brcUDmyQSaD X-Received: by 10.68.224.40 with SMTP id qz8mr28312912pbc.9.1409455445353; Sat, 30 Aug 2014 20:24:05 -0700 (PDT) Received: from [192.168.1.4] ([139.218.174.10]) by mx.google.com with ESMTPSA id o4sm6187202pdh.56.2014.08.30.20.24.03 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 30 Aug 2014 20:24:04 -0700 (PDT) From: Ben Bromhead Content-Type: multipart/alternative; boundary="Apple-Mail=_24E957C9-A910-420A-930F-F33EF7F4D2BA" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Heterogenous cluster and vnodes Date: Sun, 31 Aug 2014 13:24:03 +1000 References: <1409321378972.a1de4563@Nodemailer> To: user@cassandra.apache.org In-Reply-To: <1409321378972.a1de4563@Nodemailer> X-Mailer: Apple Mail (2.1508) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_24E957C9-A910-420A-930F-F33EF7F4D2BA Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 > Hey, >=20 > I have a few of VM host (bare metal) machines with varying amounts of = free hard drive space on them. For simplicity let=92s say I have three = machine like so: > * Machine 1 > - Harddrive 1: 150 GB available. > * Machine 2: > - Harddrive 1: 150 GB available. > - Harddrive 2: 150 GB available. > * Machine 3. > - Harddrive 1: 150 GB available. >=20 > I am setting up a Cassandra cluster between them and as I see it I = have two options: >=20 > 1. I set up one Cassandra node/VM per bare metal machine. I assign all = free hard drive space to each Cassandra node and I balance the cluster = using vnodes proportionally to the amount of free hard drive space = (CPU/RAM is not going to be a bottle neck here). >=20 > 2. I set up four VMs, each running a Cassandra node with equal amount = of hard drive space and equal amount of vnodes. Machine 2 runs two VMs. This setup will potentially create a situation where if Machine 2 goes = down you may lose two replicas. As the two VMs on Machine 2 might be = replicas for the same key. >=20 > General question: Is any of these preferable to the other? I = understand 1) yields lower high-availability (since nodes are on the = same hardware). Other way around (2 would be potentially lower availability)=85 = Cassandra thinks two of the vm's are separate when they in fact rely on = the same underlying machine. >=20 > Question about alternative 1: With varying vnodes, can I always be = sure that replicas are never put on the same virtual machine? Yes=85 mostly https://issues.apache.org/jira/browse/CASSANDRA-4123 > Or is varying vnodes really only useful/recommended when migrating = from machines with varying hardware (like mentioned in [1])? Changing the number of vnodes changes the portion of the ring a node is = responsible for. You can use it to account for different types of = hardware, you can also use it for creating awesome situations like = hotspots if you aren't careful=85 ymmv. At the end of the day I would throw out the extra hard drive / not use = it / put more hard drives in the other machines. Why? Hard drives are = cheap and your time as an admin for the cluster isn't. If you do add = more hard drives you can also split out the commit log etc onto = different disks. I would take less problems over trying to draw every last scrap of = performance out of the available hardware any day of the year.=20 Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 --Apple-Mail=_24E957C9-A910-420A-930F-F33EF7F4D2BA Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252

Hey,

I have a few of VM host (bare metal) machines with varying amounts = of free hard drive space on them. For simplicity let=92s say I have = three machine like so:
 * Machine 1
  - Harddrive 1: 150 GB available.
 * Machine 2:
  - Harddrive 1: 150 GB available.
  - Harddrive 2: 150 GB available.
 * Machine 3.
  - Harddrive 1: 150 GB available.

I am setting up a Cassandra cluster between them = and as I see it I have two options:

1. I set up one Cassandra node/VM per bare metal = machine. I assign all free hard drive space to each Cassandra node and I = balance the cluster using vnodes proportionally to the amount of free = hard drive space (CPU/RAM is not going to be a bottle neck here).

2. I set up four VMs, each running a Cassandra node = with equal amount of hard drive space and equal amount of vnodes. = Machine 2 runs two = VMs.

This setup = will potentially create a situation where if Machine 2 goes down you may = lose two replicas. As the two VMs on Machine 2 might be replicas for the = same key.


General question: Is any of these preferable to the other? I = understand 1) yields lower high-availability (since nodes are on the = same hardware).

Other = way around (2 would be potentially lower availability)=85 Cassandra = thinks two of the vm's are separate when they in fact rely on the same = underlying machine.


Question about alternative 1: With varying vnodes, can I always be = sure that replicas are never put on the same virtual machine? =


Or is varying = vnodes really only useful/recommended when migrating from machines with = varying hardware (like mentioned in = [1])?

Changing the number = of vnodes changes the portion of the ring a node is responsible for. You = can use it to account for different types of hardware, you can also use = it for creating awesome situations like hotspots if you aren't careful=85 = ymmv.

At the end of the day I would throw out = the extra hard drive / not use it / put more hard drives in the other = machines. Why? Hard drives are cheap and your time as an admin for the = cluster isn't. If you do add more hard drives you can also split out the = commit log etc onto different disks.

I would = take less problems over trying to draw every last scrap of performance = out of the available hardware any day of the = year. 


Ben = Bromhead
Instaclustr | www.instaclustr.com | = @instaclustr | +61 = 415 936 359

= --Apple-Mail=_24E957C9-A910-420A-930F-F33EF7F4D2BA--