Reading some more (someone break in when I lose my clue ;-)
Reading the streams page in the wiki about anticompaction, I think the best approach to take when a node gets its disks overfull, is to set the compaction thresholds to 0 on all nodes, decommission the overfull node, wait for stuff to get redistributed, and then clean off the decommissioned node and bootstrap it. Since the disks are too full for an anticompaction, you can't move the token on that node.
Given this, I wonder about the right approach to capacity planning. If I want to store, say, 500M rows, and I know based on current cfstats that the mean compacted size row is 27k, how much overhead is there on top of the 13.5 TB of raw data?
Trying to compute from what I have, in cfstats I have a total "Spaced used (total)" of around 1.6TB (this is only a subset of the data loaded so far), but when I could data directories using du(1) I get around 23TB already used.
Ok, answered part of this myself. You can stop a node, move files around on the data disks, as long as they stay in the right keyspace directories, and all is fine.
Now, I have a single Data.db file which is 900GB and is compacted. The drive its on is only 1.5TB, so it can't anticompact at all. Is there anything I can do? The replication factor is 3, so one idea is to take down the node, blow away the huge file, adjust the token, and restart the node. At that point I'm not sure what to tell the new node or other nodes to do... do I need to run a repair, or a cleanup, or a loadbalance, or ... what?
It would be great to be able to fix a storage quota on a per-data-directory basis, to ensure that enough capacity is retained for anticompaction. Default 45% quota, adjustable for the brave.
IanOn Tue, Jun 1, 2010 at 4:08 PM, Ian Soboroff <email@example.com> wrote:My nodes have 5 disks and are using them separately as data disks. The usage on the disks is not uniform, and one is nearly full. Is there some way to manually balance the files across the disks? Pretty much anything done via nodetool incurs an anticompaction with obviously fails. system/ is not the problem, it's in my data's keyspace.