It was the long time since last repair that did it. We've scheduled regular repairs now, and this time the repairs didn't increase the load very much. So that was it! :-)
Nothing unusual. When you run repair cassandra streams inconsistent regions from all replicas. If you have wide rows or didn't run repair regularly it is very easy to get 10-20% of extra data from each replica. What probably happens in your case. Theoretically cassandra should compact new sstables you get from other nodes. But, by default cassandra compacts sstables in the same size tier. Because of major compaction you ran before, you have one big sstable and a bunch of small. So, there is nothing to compact right now. Eventually cassandra will compact them. But nobody knows when it will happen. This is one of problems caused by major compaction. For maintenance it is better to have a set of small sstables then one big.AndreyOn Thu, Nov 8, 2012 at 2:55 AM, Henrik Schröder <email@example.com> wrote:
We recently ran a major compaction across our cluster, which reduced the storage used by about 50%. This is fine, since we do a lot of updates to existing data, so that's the expected result.
The day after, we ran a full repair -pr across the cluster, and when that finished, each storage node was at about the same size as before the major compaction. Why does that happen? What gets transferred to other nodes, and why does it suddenly take up a lot of space again?
We haven't run repair -pr regularly, so is this just something that happens on the first weekly run, and can we expect a different result next week? Or does repair always cause the data to grow on each node? To me it just doesn't seem proportional?