Thanks for helpful responses. The upgrade from 0.8 to 1.2 is not direct, we have setup test cluster where we did upgrade from 0.8 to 1.1 and then 1.2. Also we will do a whole different cluster with 1.2, the 0.8 cluster will not be upgraded. But the data will be moved from 0.8 cluster to 1.2 cluster. 
The reason for me looking at virtual nodes is because of terrible experiences we had with 0.8 repairs and as per documentation (an logically) the virtual nodes seems like it will help repairs being smoother. Is this true? Also how to get the right number of virtual nodes? David suggested 64 vnodes for 20 machines. Is there a formula or a thought process to be followed to get this number right?

On Mon, Jul 29, 2013 at 4:15 AM, aaron morton <> wrote:
I would *strongly* recommend against upgrading from 0.8 directly to 1.2. Skipping a major version is generally not recommended, skipped 3 would seem like carelessness. 

I second Romain, do the upgrade and make sure the health is good first.
+1 but I would also recommend deciding if you actually need to use virtual nodes. The shuffle process can take a long time and people have had mixed experiences with it. 

If you wanted to move to 1.2 and get vNodes I would consider spinning up a new cluster and bulk loading into it. You could do an initial load and then to delta loads using snapshots, there would however be a period of stale data in the new cluster until the last delta snapshot is loaded. 


Aaron Morton
Cassandra Consultant
New Zealand


On 27/07/2013, at 3:36 AM, David McNelis <> wrote:

I second Romain, do the upgrade and make sure the health is good first.

If you have or plan to have a large number of nodes, you might consider using fewer than 256 as your initial vnodes amount.  I think that number is inflated from reasonable in the docs, as we've had some people talk about potential performance degradation if you have a large number of nodes and a very high number of vnodes, if I had it to do over again, I'd have done 64 vnodes as my default (across 20 nodes).

Another thing to be very cognizant of before shuffle is disk space.  You *must* have less than 50% used in order to do the shuffle successfully because no data is removed (cleaned) from a node during the shuffle process and the shuffle process essentially doubles the amount of data until you're able to run a clean.

On Fri, Jul 26, 2013 at 11:25 AM, Romain HARDOUIN <> wrote:
Vnodes are a great feature. More nodes are involved during operations such as bootstrap, decommission, etc.
DataStax documentation is definitely a must read.
That said, If I were you, I'd wait somewhat before to shuffle the ring. I'd focus on cluster upgrade and monitoring the nodes. (number of files handles, memory usage, latency, etc).
Upgrading from 0.8 to 1.2 can be tricky, there are so many changes since then. Be careful about compaction strategies you choose and double check the options.


rash aroskar <> a écrit sur 25/07/2013 23:25:11 :

> De : rash aroskar <>

> A :,
> Date : 25/07/2013 23:25
> Objet : cassandra 1.2.5- virtual nodes (num_token) pros/cons?
> Hi,

> I am upgrading my cassandra cluster from 0.8 to 1.2.5. 
> In cassandra 1.2.5 the 'num_token' attribute confuses me. 
> I understand that it distributes multiple tokens per node but I am
> not clear how that is helpful for performance or load balancing. Can
> anyone elaborate? has anyone used this feature  and knows its
> advantages/disadvantages?

> Thanks,

> Rashmi