Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of dwahler@indeed.com designates
 74.125.82.41 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAA=6J0-Y1Yp4CZsa7g=4fB1tb_bYBxTKva4H2cytxM62rUG4pQ@mail.gmail.com>
References: 
 <CACV_4ivNCxfz8DaqPZupe7t=RqiFftAMCf6u8m=ATFkcD9p4hg@mail.gmail.com>
 <54792650.219886.1426544924926.JavaMail.yahoo@mail.yahoo.com>
 <CACV_4it5YDUrqouUFbTv=muT+U-xXEiOH6xasuYY+R6OrBDAZA@mail.gmail.com>
 <CAA=6J0-RJVCRrH7tCLNBkF+3yeffebFgko6bnEWBXV1r5RzMyg@mail.gmail.com>
 <CACV_4ivkH17n7WXvYnoheJDnvRZ4Zam55kr-+sQvX=d4YK_WCA@mail.gmail.com>
 <CAA=6J08+K3R65SmBibiovE2LSqHJ2VFbE7D6sttg15_Gw6by1g@mail.gmail.com>
 <CACV_4ivFJ4i+fRPPmJNWRUwW6zqdi6cDURxs9z4Hq1YVOs-gNA@mail.gmail.com>
 <CAA=6J0-Y1Yp4CZsa7g=4fB1tb_bYBxTKva4H2cytxM62rUG4pQ@mail.gmail.com>
From: David Wahler <dwahler@indeed.com>
Date: Fri, 20 Mar 2015 19:00:20 -0500
Message-ID: 
 <CACV_4ivpPU-hwZVAhHmEnLS-GuJk=i3rixFE_jYhFafVvfp4CQ@mail.gmail.com>
Subject: Re: Deleted snapshot files filling up /var/lib/cassandra
To: user@cassandra.apache.org
Content-Type: text/plain; charset=UTF-8

Sorry if I was unclear. The /var/lib/cassandra partition didn't
literally become 100% full. Our low-disk-space alarms started going
off, and that's when we noticed that the disk usage on several nodes
was steadily increasing much faster than expected. We restarted the
worst-affected node before it could run entirely out of free space,
and that brought the disk usage back down to the expected level, but
it immediately started creeping upwards again. The snapshot failures
have been happening periodically, starting before we noticed the high
disk usage.

I'll do some more digging through the logs to see if the exceptions
are recurring, and open a Jira ticket once I have more information.

On Fri, Mar 20, 2015 at 11:52 AM, Ben Bromhead <ben@instaclustr.com> wrote:
> Sorry for the late reply.
>
> To immediately solve the problem you can restart Cassandra and all the open
> file descriptors to the deleted snapshots should disappear.
>
> As for why it happened I would first address the disk space issue and see if
> the snapshot errors + open file descriptors issue still occurs (I am unclear
> as to whether you got the snapshot exception after the disk filled up or
> before), if you still have issues with repair not letting go of snapshotted
> files even with free disk space I would look to raise a ticket in Jira.
>
> On 17 March 2015 at 12:46, David Wahler <dwahler@indeed.com> wrote:
>>
>> On Mon, Mar 16, 2015 at 6:51 PM, Ben Bromhead <ben@instaclustr.com> wrote:
>> > If you are running a sequential repair (or have previously run a
>> > sequential
>> > repair that is still running) Cassandra will still have the file
>> > descriptors
>> > open for files in the snapshot it is using for the repair operation.
>>
>> Yeah, that aligns with my understanding of how the repair process
>> works. But the cluster has no repair sessions active (I think; when I
>> run "nodetool tpstats", the AntiEntropyStage and AntiEntropySessions
>> values are zero on all nodes) and the space still hasn't been freed.
>
>
>
>
> --
>
> Ben Bromhead
>
> Instaclustr | www.instaclustr.com | @instaclustr | (650) 284 9692