incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: Fill disks more than 50%
Date Fri, 25 Feb 2011 15:54:10 GMT
On Fri, Feb 25, 2011 at 7:38 AM, Terje Marthinussen
<> wrote:
>> @Thibaut Britz
>> Caveat:Using simple strategy.
>> This works because cassandra scans data at startup and then serves
>> what it finds. For a join for example you can rsync all the data from
>> the node below/to the right of where the new node is joining. Then
>> join without bootstrap then cleanup both nodes. (also you have to
>> shutdown the first node so you do not have a lost write scenario in
>> the time between rsync and new node startup)
> rsync all data from node to left/right..
> Wouldn't that mean that you need 2x the data to recover...?
> Terje


In your scenario where you are never updating running repair becomes
less important. I have an alternative for you. I have a program I call
the "RescueRanger" we use it to range-scan all our data, find old
entries and then delete them. However if we set that program to "read
only mode" and tell it to read at CL.ALL, It becomes a program that
read repairs data!

This is a tradeoff. Range scanning though all your data is not fast,
but it does not require the extra disk space. Kinda like merge sort vs
bubble sort.

View raw message