Sent from my iPhone

On 27.09.2010, at 19:30, Marc Canaleta <> wrote:

What do you mean by "running live"? I am also planning to use cassandra on EC2 using small nodes. Small nodes have 1/4 cpu of the large ones, 1/4 cost, but I/O is more than 1/4 (amazon does not give explicit I/O numbers...), so I think 4 small instances should perform better than 1 large one (and the cost is the same), am I wrong?

Based on results we saw and what you also find in different sources around the web, EC2 small instances perform worse than 1/4 regarding IO performance.

El 27 de septiembre de 2010 18:09:14 UTC+2, Jonathan Ellis <> escribiĆ³:
I strongly recommend not running live on Small nodes.  So in your case
I would recommend starting up Large instances with raid0'd disks, shut
down cassandra on the Small ones, rsync to the Large, and start up on

On Mon, Sep 27, 2010 at 6:46 AM, Utku Can TopƧu <> wrote:
> Hi All,
> We're currently running a cassandra cluster with Replication Factor 3,
> consisting of 4 nodes.
> The current situation is:
> - The nodes are all identical (AWS small instances)
> - Data directory is in the partition (/mnt) which has 150G capacity and each
> node has around 90 GB load, so 60 G free space per node is left.
> So adding a new node to the cluster will seem to cause problems for us. I
> think the node which will stream the data to the new bootstrapping node,
> will not have enough disk space for anticompacting its data.
> What should be the best practice for such scenarios?
> Regards,
> Utku

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support