cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <>
Subject Re: nodetool repair uses insane amount of disk space
Date Sat, 18 Aug 2012 04:46:35 GMT
> How come a node would consume 5x its normal data size during the repair
> process?

It's likely a variation based on how out of synch you happen to be,
and whether you have a neighbor that's also been repaired and bloated
up already.

> My setup is kind of strange in that it's only about 80-100GB of data on a 35
> node cluster, with 2 data centers and 3 racks, however the rack assignments
> are unbalanced.  One data center has 8 nodes, and the other data center is
> split into 2 racks with one rack of 9 nodes, and the other with 18 nodes.
> However, within each rack, the tokens are distributed equally. It's a long
> sad story about how we ended up this way, but it basically boils down to
> having to utilize existing resources to resolve a production issue.

In terms of DCs, different DC:s are effectively independent of each
other in terms of replica placement. So there is no need or desire for
two DC:s to be symmetrical.

The racks are important though if you are trying to take advantage of
racks being somewhat independent failure domains (for reasons outlined
in 3810 above).

/ Peter Schuller (@scode,

View raw message