incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <peter.schul...@infidyne.com>
Subject Re: Why data tripled in size after repair?
Date Wed, 26 Sep 2012 19:36:12 GMT
> What is strange every time I run repair data takes almost 3 times more
> - 270G, then I run compaction and get 100G back.

https://issues.apache.org/jira/browse/CASSANDRA-2699 outlines the
maion issues with repair. In short - in your case the limited
granularity of merkle trees is causing too much data to be streamed
(effectively duplicate data).
https://issues.apache.org/jira/browse/CASSANDRA-3912 may be a bandaid
for you in that it allows granularity to be much finer, and the
process to be more incremental.

A 'nodetool compact' decreases disk space temporarily as you have
noticed, but it may also have a long-term negative effect on steady
state disk space usage depending on your workload. If you've got a
workload that's not limited to insertions only (i.e., you have
overwrites/deletes), a major compaction will tend to push steady state
disk space usage up - because you're creating a single sstable bigger
than what would normally happen, and it takes more total disk space
before it will be part of a compaction again.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Mime
View raw message