Nicolai,

Perhaps you can check the system.log to see if there are any errors on compaction. Also, I believe C* 1.2.0 it's not a stable version.




On Thu, May 9, 2013 at 2:43 AM, Nicolai Gylling <ng@issuu.com> wrote:
Hi

I have a 3-node SSD-based cluster, with around 1 TB data, RF:3, C* v.1.2.0, vnodes. One large CF, LCS. Everything was running smooth, until one of the nodes crashed and was restarted.

At the time of normal operation there was 800 gb free space on each node. After the crash, C* started using a lot more, resulting in an out-of-diskspace situation on 2 nodes, eg. C* used up the 800 gb in just 2 days, giving us very little time to do anything about it, since repairs/joins takes a considerable amount of time.

What can make C* suddenly use this amount of disk-space? We did see a lot of pending compactions on one node (7k).

Any tips on recovering from an out-of-diskspace on multiple nodes, situation? I've tried moving some SStables away, but C* seems to use whatever space I free up in no time. I'm not sure if any of the nodes is fully updated as 'nodetool status' reports 3 different loads

--  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.146.145.26     1.4 TB     256     100.0%            1261717d-ddc1-457e-9c93-431b3d3b5c5b  rack1
UN  10.148.149.141    1.03 TB    256     100.0%            f80bfa31-e19d-4346-9a14-86ae87f06356  rack1
DN  10.146.146.4      1.11 TB    256     100.0%            85d4cd28-93f4-4b96-8140-3605302e90a9  rack1


--

Sincerely,

Nicolai Gylling