incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Bailey <n...@riptano.com>
Subject Re: Cassandra and disk space
Date Fri, 10 Dec 2010 00:15:37 GMT
Additionally, cleanup will fail to run when the disk is more than 50% full.
Another reason to stay below 50%.

On Thu, Dec 9, 2010 at 6:03 PM, Tyler Hobbs <tyler@riptano.com> wrote:

> Yes, that's correct, but I wouldn't push it too far.  You'll become much
> more sensitive to disk usage changes; in particular, rebalancing your
> cluster will particularly difficult, and repair will also become dangerous.
> Disk performance also tends to drop when a disk nears capacity.
>
> There's no recommended maximum size -- it all depends on your access
> rates.  Anywhere from 10 GB to 1TB is typical.
>
> - Tyler
>
>
> On Thu, Dec 9, 2010 at 5:52 PM, Rustam Aliyev <rustam@code.az> wrote:
>
>>
>> That depends on your scenario.  In the worst case of one big CF, there's
>> not much that can be easily done for the disk usage of compaction and
>> cleanup (which is essentially compaction).
>>
>> If, instead, you have several column families and no single CF makes up
>> the majority of your data, you can push your disk usage a bit higher.
>>
>>
>> Is there any formula to calculate this? Let's say I have 500GB in single
>> CF. So I need at least 500GB of free space for compaction. If I partition
>> this CF and split it into 10 proportional CFs each 50GB, does it mean that I
>> will need only 50GB of free space?
>>
>> Also, is there recommended maximum of data size per node?
>>
>> Thanks.
>>
>>
>> A fundamental idea behind Cassandra's architecture is that disk space is
>> cheap (which, indeed, it is).  If you are particularly sensitive to this,
>> Cassandra might not be the best solution to your problem.  Also keep in mind
>> that Cassandra performs well with average disks, so you don't need to spend
>> a lot there.  Additionally, most people find that the replication protects
>> their data enough to allow them to use RAID 0 instead of 1, 10, 5, or 6.
>>
>> - Tyler
>>
>> On Thu, Dec 9, 2010 at 12:20 PM, Rustam Aliyev <rustam@code.az> wrote:
>>
>>>  Is there any plans to improve this in future?
>>>
>>> For big data clusters this could be very expensive. Based on your
>>> comment, I will need 200TB of storage for 100TB of data to keep Cassandra
>>> running.
>>>
>>> --
>>>  Rustam.
>>>
>>> On 09/12/2010 17:56, Tyler Hobbs wrote:
>>>
>>> If you are on 0.6, repair is particularly dangerous with respect to disk
>>> space usage.  If your replica is sufficiently out of sync, you can triple
>>> your disk usage pretty easily.  This has been improved in 0.7, so repairs
>>> should use about half as much disk space, on average.
>>>
>>> In general, yes, keep your nodes under 50% disk usage at all times.  Any
>>> of: compaction, cleanup, snapshotting, repair, or bootstrapping (the latter
>>> two are improved in 0.7) can double your disk usage temporarily.
>>>
>>> You should plan to add more disk space or add nodes when you get close to
>>> this limit.  Once you go over 50%, it's more difficult to add nodes, at
>>> least in 0.6.
>>>
>>> - Tyler
>>>
>>> On Thu, Dec 9, 2010 at 11:19 AM, Mark <static.void.dev@gmail.com> wrote:
>>>
>>>> I recently ran into a problem during a repair operation where my nodes
>>>> completely ran out of space and my whole cluster was... well, clusterfucked.
>>>>
>>>> I want to make sure how to prevent this problem in the future.
>>>>
>>>> Should I make sure that at all times every node is under 50% of its disk
>>>> space? Are there any normal day-to-day operations that would cause the any
>>>> one node to double in size that I should be aware of? If on or more nodes
to
>>>> surpass the 50% mark, what should I plan to do?
>>>>
>>>> Thanks for any advice
>>>>
>>>
>>>
>>
>

Mime
View raw message