cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Turner <>
Subject Re: Disk configuration in new cluster node
Date Sun, 23 Sep 2012 19:54:56 GMT
On Fri, Sep 21, 2012 at 2:05 AM, aaron morton <> wrote:
>> Would it help if I partitioned the computing resources of my physical
>> machines into VMs?
> No.
> Just like cutting a cake into smaller pieces does not mean you can eat more
> without getting fat.
> In the general case, regular HDD and 1 Gbe and 8 to 16 virtual cores and 8GB
> to 16GB ram, you can expect to comfortably run up 400GB of data (maybe
> 500GB). That is replicated storage,  so 400 / 3 = 133GB if you replicate
> data 3 times.

Remember also that these numbers reflect total size of your sstables.
This is both good and bad:

1. Good, because if you use compression you can store more data.  I'm
doing time series data for network statistics and I'm seeing extremely
good compression numbers (better then 10:1)

2. Bad, because if you're doing a lot of deletes, the old data +
tombstones count against you until they're actually purged from disk.

This can create rather interesting disk usage situations where my
"rolling 48 hours" of current data CF takes significantly more disk
space then my historical CF which currently stores over 4 months worth
of data.   I'm thinking about repairing the rolling 48 hours CF more
often and reducing the gc_grace time so that compaction has a better
chance of removing stale data from disk.

Aaron Turner         Twitter: @synfinatic - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

View raw message