Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of tom@acunu.com designates
 209.85.220.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAENxBwxeiG0fCBgPcwi4mbm1L=5jqHA2c5A=7WDL0WOygKadQA@mail.gmail.com>
References: 
 <CADjM4zvCm97jSiPhn59WsA1k5bgR-SOqcKsBo+_ZMpyRC69DOg@mail.gmail.com>
	<CAO5xsd0Q3Q3Tspdgfnk-ei27-DGKKm5sqZVh7mA2ez1unN-Z2A@mail.gmail.com>
	<CADjM4zusga6t_LeiaD_Fhc7ZO=NafptVav2YkSYjeynk=cgc+A@mail.gmail.com>
	<CAPVEdFsG010qSksjG53X5A-SKtFcWnodsSyJzCMG6ts6p+46Ew@mail.gmail.com>
	<CALdd-zi47wprrcE0FC2JUotUPRS4Q-xkKWN_2ayp=BiODp2Q+g@mail.gmail.com>
	<CAL35Oi0v=yFz-1tOc=v_SrpT5-K-ydz76prcrDcMgd1UhEa3CA@mail.gmail.com>
	<CALdd-zjFYC5S5K7Sg0KEU_WdXhKt8TW9hALsNF7TVwv10KVifw@mail.gmail.com>
	<CAEONjQLPFw0CTvSLJgN_PxT89T8Fd0tCubPLPYb9ttvc1wtbmg@mail.gmail.com>
	<CALdd-ziioxzRJkDeTVaWG1MP_gbm77Tq3V_-p5pzgVrqNqMwdg@mail.gmail.com>
	<CAL35Oi0S3FhHaJO4MYDtqD5VWGg8W1YOBuOaxHUC=Paxx2MiwQ@mail.gmail.com>
	<CAENxBwxeiG0fCBgPcwi4mbm1L=5jqHA2c5A=7WDL0WOygKadQA@mail.gmail.com>
Date: Wed, 21 Mar 2012 14:24:01 -0500
Message-ID: 
 <CALWSSv6HjnAzj-8HaOyu2Oqk2yp-s27JTxkpuwuUZMsyx1=A0g@mail.gmail.com>
Subject: Re: RFC: Cassandra Virtual Nodes
From: Tom Wilkie <tom@acunu.com>
To: dev@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1

Hi Edward

> 1) No more raid 0. If a machine is responsible for 4 vnodes they
> should correspond to for JBOD.

So each vnode corresponds to a disk?  I suppose we could have a
separate data directory per disk, but I think this should be a
separate, subsequent change.

However, do note that making the vnode ~size of a disk (and only have
4-8 per machine) would make any non-hotswap rebuilds slower.  To get
the fast distributed rebuilds, you need to have at least as many
vnodes per node as you do nodes in the cluster.  And you would still
need the distributed rebuilds to deal with disk failure.

> 2) Vnodes should be able to be hot pluged. My normal cassandra chassis
> would be a 2U with 6 drive bays. Imagine I have 10 nodes. Now if my
> chassis dies I should be able to take the disks out and physically
> plug them into another chassis. Then in cassandra I should be able to
> run a command like.
> nodetool attach '/mnt/disk6'. disk6 should contain all data an it's
> vnode information.
>
> Now this would be awesome for upgrades/migrations/etc.

You know, your not the first person I've spoke to who has asked for
this!  I do wonder whether it is optimising for the right thing though
- in my experience, disks fail more often than machines.

Thanks

Tom