cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jmodha <>
Subject BulkLoading SSTables and compression
Date Thu, 28 Jun 2012 09:53:22 GMT

We are migrating our Cassandra cluster from v1.0.3 to v1.1.1, the data is
migrated using SSTableLoader to an empty Cassandra cluster.

The data in the source cluster (v1.0.3) is uncompressed and the target
cluster (1.1.1) has the column family created with compression turned on.

What we are seeing is that once the data has been loaded into the target
cluster, the size is similar to the data in the source cluster. Our
expectation is that since we have turned on compression in the target
cluster, the amount of data would be reduced.

We have tried running the "rebuildsstables" nodetool command on a node after
data has been loaded and we do indeed see a huge reduction in size e.g. from
30GB to 10GB for a given column family. We were hoping to see this at the
point of loading the data in via the SSTableLoader.

Is this behaviour expected? 

Do we need to run the rebuildsstables command on all nodes to actually
compress the data after it has been streamed in?


View this message in context:
Sent from the mailing list archive at

View raw message