cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood" <stu.h...@rackspace.com>
Subject RE: Help! Cassandra disk space utilization WAY higher than I would expect
Date Fri, 09 Jul 2010 15:24:56 GMT
Cassandra has a very high constant per-row overhead at the moment of around 40 bytes. Additionally,
there is around 12 bytes of overhead per column. Finally, column names are repeated for each
row.

CASSANDRA-674 and CASSANDRA-1207 will help with these overheads, but they will not be fixed
until 0.8. The file format change should bring lovely things like compression and variable
length encoding, which Cassandra will gain huge benefits from.

But, "disk is cheap"... the solution for now is to add more nodes. And why not?

Thanks,
Stu


-----Original Message-----
From: "Julie" <julie.sugar@nextcentury.com>
Sent: Friday, July 9, 2010 9:58am
To: user@cassandra.apache.org
Subject: Help! Cassandra disk space utilization WAY higher than I would expect

Hi guys,
I am on the hook to explain why 30GB of data is filling up 106GB of disk space
since this is concerning information for my project.  

We are very excited about the possibility of using Cassandra but need to
understand this anomaly in order to feel confident.  Does anyone know why this
could be happening?

cfstats reports that space used live is equal to space used total so I think the
data is truly taking up 106GB, I just can't explain why.

 		Space used (live): 113946099884
 		Space used (total): 113946099884

Thank you for any guidance!
Julie






Mime
View raw message