cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: data size difference between supercolumn and regular column
Date Sat, 31 Mar 2012 07:28:21 GMT
> does cassandra 1.0 perform some default compression? 

The on disk size depends to some degree on the work load. 

If there are a lot of overwrites or deleted you may have rows/columns that need to be compacted.
You may have some big old SSTables that have not been compacted for a while. 

There is some overhead involved in the super columns: the super col name, length of the name
and the number of columns.  


Aaron Morton
Freelance Developer

On 29/03/2012, at 9:47 AM, Yiming Sun wrote:

> Actually, after I read an article on cassandra 1.0 compression just now (,
I am more puzzled.  In our schema, we didn't specify any compression options -- does cassandra
1.0 perform some default compression? or is the data reduction purely because of the schema
change?  Thanks.
> -- Y.
> On Wed, Mar 28, 2012 at 4:40 PM, Yiming Sun <> wrote:
> Hi,
> We are trying to estimate the amount of storage we need for a production cassandra cluster.
 While I was doing the calculation, I noticed a very dramatic difference in terms of storage
space used by cassandra data files.
> Our previous setup consists of a single-node cassandra 0.8.x with no replication, and
the data is stored using supercolumns, and the data files total about 534GB on disk.
> A few weeks ago, I put together a cluster consisting of 3 nodes running cassandra 1.0
with replication factor of 2, and the data is flattened out and stored using regular columns.
 And the aggregated data file size is only 488GB (would be 244GB if no replication).
> This is a very dramatic reduction in terms of storage needs, and is certainly good news
in terms of how much storage we need to provision.  However, because of the dramatic reduction,
I also would like to make sure it is absolutely correct before submitting it - and also get
a sense of why there was such a difference. -- I know cassandra 1.0 does data compression,
but does the schema change from supercolumn to regular column also help reduce storage usage?
> -- Y.

View raw message