cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Pollack <paul.poll...@klaviyo.com>
Subject Re: Drastic increase in disk usage after starting repair on 3.7
Date Thu, 21 Sep 2017 14:50:53 GMT
So I got to the bottom of this -- turns out it's not an issue with
Cassandra at all. Seems that whenever these instances were set up we had
originally mounted 2TB drives from /dev/xvdc and those were persisted to
/etc/fstab, but at some point someone unmounted those and replaced them
with 4TB drives on /dev/xvdf, however, didn't fix fstab. So what has
esssentially happened is I brought a node back into the cluster with a
blank data drive and started a repair, which I'm guessing then went and
started adding all the data that just wasn't there at all. I've killed the
repair and am going to replace that node.

On Thu, Sep 21, 2017 at 7:58 AM, Paul Pollack <paul.pollack@klaviyo.com>
wrote:

> Thanks for the suggestions guys.
>
> Nicolas, I just checked nodetool listsnapshots and it doesn't seem like
> those are causing the increase:
>
> Snapshot Details:
> Snapshot name                            Keyspace name Column family
> name         True size Size on disk
> 1479343904106-statistic_segment_timeline klaviyo
> statistic_segment_timeline 91.73 MiB 91.73 MiB
> 1479343904516-statistic_segment_timeline klaviyo
> statistic_segment_timeline 69.42 MiB 69.42 MiB
> 1479343904607-statistic_segment_timeline klaviyo
> statistic_segment_timeline 69.43 MiB 69.43 MiB
>
> Total TrueDiskSpaceUsed: 91.77 MiB
>
> Kurt, we definitely do have a large backlog of compactions, but I would
> expect only the currently running compactions to take up 2x extra space,
> and for that space to be freed up after its completion, is that an
> inaccurate idea of how compaction actually works? When the disk was almost
> full at 2TB I increased the EBS volume to 3TB, and now it's using 2.6TB so
> I think it's only a matter of hours before it takes up the space on the
> rest of the volume. The largest files on disk are *-big-Data.db files. Is
> there anything else I can check that might indicate whether or not the
> repair is really the root cause of this issue?
>
> Thanks,
> Paul
>
> On Thu, Sep 21, 2017 at 4:02 AM, Nicolas Guyomar <
> nicolas.guyomar@gmail.com> wrote:
>
>> Hi Paul,
>>
>> This might be a long shot, but some repairs might fail to clear their
>> snapshot (not sure if its still the case with C* 3.7 however, I had the
>> problem on 2.X branche).
>> What does nodetool listsnapshot indicate ?
>>
>> On 21 September 2017 at 05:49, kurt greaves <kurt@instaclustr.com> wrote:
>>
>>> repair does overstream by design, so if that node is inconsistent you'd
>>> expect a bit of an increase. if you've got a backlog of compactions that's
>>> probably due to repair and likely the cause of the increase. if you're
>>> really worried you can rolling restart to stop the repair, otherwise maybe
>>> try increasing compaction throughput.
>>>
>>
>>
>

Mime
View raw message