What do you mean by "running live"? I am also planning to use cassandra on EC2 using small nodes. Small nodes have 1/4 cpu of the large ones, 1/4 cost, but I/O is more than 1/4 (amazon does not give explicit I/O numbers...), so I think 4 small instances should perform better than 1 large one (and the cost is the same), am I wrong?
El 27 de septiembre de 2010 18:09:14 UTC+2, Jonathan Ellis <jbellis@gmail.com> escribió:I strongly recommend not running live on Small nodes. So in your case
I would recommend starting up Large instances with raid0'd disks, shut
down cassandra on the Small ones, rsync to the Large, and start up on
Large.
--
On Mon, Sep 27, 2010 at 6:46 AM, Utku Can TopƧu <utku@topcu.gen.tr> wrote:
> Hi All,
>
> We're currently running a cassandra cluster with Replication Factor 3,
> consisting of 4 nodes.
>
> The current situation is:
>
> - The nodes are all identical (AWS small instances)
> - Data directory is in the partition (/mnt) which has 150G capacity and each
> node has around 90 GB load, so 60 G free space per node is left.
>
> So adding a new node to the cluster will seem to cause problems for us. I
> think the node which will stream the data to the new bootstrapping node,
> will not have enough disk space for anticompacting its data.
>
> What should be the best practice for such scenarios?
>
> Regards,
>
> Utku
>
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com