cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Pittier - Rezel <jor...@rezel.net>
Subject Re: Cassandra disk space utilization WAY higher than I would expect
Date Wed, 07 Jul 2010 17:23:53 GMT
I see the same thing here. I have tried to do some maths including
timestamps, columns name, keys and raw data but in the end cassandra reports
a cluster size from 2 to 3 times bigger than the raw data. I am surely
missing something in my formula + i have a lot of free hard drive space, so
it's not a big issue to me. Just puzzling.

On Wed, Jul 7, 2010 at 7:17 PM, Peter Schuller
<peter.schuller@infidyne.com>wrote:

> > I am thinking that the timestamps and column names should be included in
> the
> > column family stats, which basically says 300,000 rows that are 100KB
> each=30
> > GB.  My rows only have 1 column so there should only be one timestamp.
>  My
> > column name is only 10 bytes long.
> >
> > This doesn't explain why 30 GB of data is taking up 106 GB of disk 24
> hours
> > after all writes have completed.  Compactions should be complete, no?
>
> Nope, it sounds fishy to me. Presuming that compaction is not actively
> running in the background still (should be obvious from logs and/or
> CPU usage and/or disk I/O).
>
> --
> / Peter Schuller
>

Mime
View raw message