Thank you Edward.

I suspect that nodetool cleanup is IO intensive. So running nodetool cleanup concurrently on the entire cluster may have a significantly impact the IO  performance of applications.

Apart from this, do you see any other implications on running the nodetool cleanup concurrently on the entire cluster ?

Thank you
Emalayan


From: Edward Capriolo <edlinuxguru@gmail.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>; Emalayan Vairavanathan <svemalayan@yahoo.com>
Sent: Monday, 10 June 2013 2:53 PM
Subject: Re: [Cassandra] Expanding a Cassandra cluster

You eventually should run cleanup to remove data no longer needed on the node. However it does not need to be run quickly after a join. You can run it when you get around to it. I would run it on a few nodes at a time until they are all cleaned up.


On Mon, Jun 10, 2013 at 5:00 PM, Emalayan Vairavanathan <svemalayan@yahoo.com> wrote:
Hi All,

Datastax manual suggests that during a Cassandra cluster expansion, an administrator has to run nodetool cleanup on each of the previously existing Cassandra nodes to remove the keys that are no longer belonging to those nodes. Further the manual says that the nodetool cleanup  task should be run sequentially on the existing Cassandra nodes.


Here is my problem: I have a very large Cassandra cluster with 100s of nodes and running nodetool cleanup sequentially will take a long time to finish. 

 Questions: a) So can someone tell me  about the implications of running the nodetool cleanup concurrently on the entire cluster ?
                   b) Will Cassandra automatically take care of removing obsolete keys in future ?


Thank you
Emalayan