cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Farooqui <>
Subject Data overhead discussion in Cassandra
Date Thu, 14 Jul 2011 19:09:35 GMT
We just set up a demo cluster with Cassandra 0.8.1 with 12 nodes and loaded
1.5 TB of data into it. However, the actual space on disk being used by data
files in Cassandra is 3 TB. We're using a standard column family with a
million rows (key=string) and 35,040 columns per key. The column name is a
long and the column value is a double.

I was just hoping to understand more about why the data overhead is so
large. We're not using expiring columns. Even considering indexing and bloom
filters, it shouldn't have bloated up the data size to 2x the original
amount. Or should it have?

How can we better anticipate the actual data usage on disk in the future?

- Sameer

View raw message