incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <>
Subject Re: Cassandra disk space utilization
Date Wed, 07 Jul 2010 16:21:49 GMT
> Keep in mind that there is additional data storage overhead, including
> timestamps and column names. Because the schema can vary from row to row,
> the column names are stored with each row, in addition to the data. Disk
> space-efficiency is not a primary design goal for Cassandra.

If the row's that are 200k (or was it 100k) are not single columns but
rather lots and lots of smaller columns, then this will be

In addition, during compaction there is the potential for using twice
the amount of disk in a column family (during a major compaction all
data will at some point exist in duplicates).

/ Peter Schuller

View raw message