cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Pollack <paul.poll...@klaviyo.com>
Subject Re: Drastic increase in disk usage after starting repair on 3.7
Date Thu, 21 Sep 2017 01:21:45 GMT
Just a quick additional note -- we have checked and this is the only node
in the cluster exhibiting this behavior, disk usage is steady on all the
others. CPU load on the repairing node is slightly higher but nothing
significant.

On Wed, Sep 20, 2017 at 9:08 PM, Paul Pollack <paul.pollack@klaviyo.com>
wrote:

> Hi,
>
> I'm running a repair on a node in my 3.7 cluster and today got alerted on
> disk space usage. We keep the data and commit log directories on separate
> EBS volumes. The data volume is 2TB. The node went down due to EBS failure
> on the commit log drive. I stopped the instance and was later told by AWS
> support that the drive had recovered. I started the node back up and saw
> that it couldn't replay commit logs due to corrupted data, so I cleared the
> commit logs and then it started up again just fine. I'm not worried about
> anything there that wasn't flushed, I can replay that. I was unfortunately
> just outside the hinted handoff window so decided to run a repair.
>
> Roughly 24 hours after I started the repair is when I got the alert on
> disk space. I checked and saw that right before I started the repair the
> node was using almost 1TB of space, which is right where all the nodes sit,
> and over the course of 24 hours had dropped to about 200GB free.
>
> My gut reaction was that the repair must have caused this increase, but
> I'm not convinced since the disk usage doubled and continues to grow. I
> figured we would see at most an increase of 2x the size of an SSTable
> undergoing compaction, unless there's more to the disk usage profile of a
> node during repair. We use SizeTieredCompactionStrategy on all the tables
> in this keyspace.
>
> Running nodetool compactionstats shows that there are a higher than usual
> number of pending compactions (currently 20), and there's been a large one
> of 292.82GB moving slowly.
>
> Is it plausible that the repair is the cause of this sudden increase in
> disk space usage? Are there any other things I can check that might provide
> insight into what happened?
>
> Thanks,
> Paul
>
>
>

Mime
View raw message