incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: Fill disks more than 50%
Date Thu, 24 Feb 2011 03:22:13 GMT
On Wed, Feb 23, 2011 at 9:39 PM, Terje Marthinussen
<> wrote:
> Hi,
> Given that you have have always increasing key values (timestamps) and never
> delete and hardly ever overwrite data.
> If you want to minimize work on rebalancing and statically assign (new)
> token ranges to new nodes as you add them so they always get the latest
> data....
> Lets say you add a new node each year to handle next years data.
> In a scenario like this, could you with 0.7 be able to safely fill disks
> significantly more than 50% and still manage things like repair/recovery of
> faulty nodes?
> Regards,
> Terje

Since all your data for a day/month/year would sit on the same server.
Meaning all your servers with old data would be idle and your servers
with current data would be very busy. This is probably not a good way
to go.

There is a ticket open for 0.8 for efficient node moves joins. It is
already a lot better in 0.7. Pretend you did not see this (you can
join nodes using rsync if you know some tricks) if you are really
afraid of joins, which you really should not be.

As for the 50% statement. In a worse case scenario a major compaction
will require double the disk size of your column family. So if you
have more then 1 column family you do NOT need 50% overhead.

View raw message