Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A9AD3178EC for ; Sat, 21 Mar 2015 00:01:09 +0000 (UTC) Received: (qmail 48544 invoked by uid 500); 21 Mar 2015 00:01:07 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 48496 invoked by uid 500); 21 Mar 2015 00:01:07 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 48482 invoked by uid 99); 21 Mar 2015 00:01:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Mar 2015 00:01:06 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dwahler@indeed.com designates 74.125.82.41 as permitted sender) Received: from [74.125.82.41] (HELO mail-wg0-f41.google.com) (74.125.82.41) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Mar 2015 00:00:42 +0000 Received: by wgbcc7 with SMTP id cc7so101459107wgb.0 for ; Fri, 20 Mar 2015 17:00:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=indeed.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=2fMym/myFAIyobAxFqBYPtS7wckVJeJoBpEmE7ZWRhw=; b=U/HQB1GK6m48oXOFIqPqC/CF8tPOVogTfAyhcopQvLxOVr+M32BgZwTDp8Kt5BNKBb JP5Q39AhxEc6ZRTHp8FJgGW+EOAlxqo534njBLC53wXTkjnpo9ukl2PJGr53opn6pkZz pn4tsdwR31l+s+8j8+c/xYkjW70TGcuRs+Q8o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=2fMym/myFAIyobAxFqBYPtS7wckVJeJoBpEmE7ZWRhw=; b=UfEal5ddTet1C/eoVqK7tNmiYKP38Iu6jAX7yANitBxecK3Js3ZoP5fij+pq1ln1xo Z8Pk77UDl5yjhXAF9GLXFymPlZrUxGsZWOP6NnDXMB6BTPSNjYaLnAVYSREJ1/RgfSc5 kBWwWK65EGk/z2+6uyIf6FeRPqMin9JH0GBkxbpME2tC9J91zwQWBVJvzJTQDCbjKhB9 KzoEwaDQuSexxyNhaMEJuAUe2EpKrTMoBecgBAb1cBjyGcudy+Qa4DthY2nNNB0CIrkP nuaEDMi+V3Z3oswddzXW5ipsgQBltyCT0JjuVRQ9Kd/NkK8Q26qFSBKXDWQdNDuE/2GV Pw0Q== X-Gm-Message-State: ALoCoQlc4Sxm+DH8ynXvJc0rYQAQyWuZbTtGTfc/Dqtf5iDMxq7ZDqxoNb9qRHIY3tuFHOqaT6dI X-Received: by 10.180.105.2 with SMTP id gi2mr422896wib.62.1426896040976; Fri, 20 Mar 2015 17:00:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.157.198 with HTTP; Fri, 20 Mar 2015 17:00:20 -0700 (PDT) In-Reply-To: References: <54792650.219886.1426544924926.JavaMail.yahoo@mail.yahoo.com> From: David Wahler Date: Fri, 20 Mar 2015 19:00:20 -0500 Message-ID: Subject: Re: Deleted snapshot files filling up /var/lib/cassandra To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org Sorry if I was unclear. The /var/lib/cassandra partition didn't literally become 100% full. Our low-disk-space alarms started going off, and that's when we noticed that the disk usage on several nodes was steadily increasing much faster than expected. We restarted the worst-affected node before it could run entirely out of free space, and that brought the disk usage back down to the expected level, but it immediately started creeping upwards again. The snapshot failures have been happening periodically, starting before we noticed the high disk usage. I'll do some more digging through the logs to see if the exceptions are recurring, and open a Jira ticket once I have more information. On Fri, Mar 20, 2015 at 11:52 AM, Ben Bromhead wrote: > Sorry for the late reply. > > To immediately solve the problem you can restart Cassandra and all the open > file descriptors to the deleted snapshots should disappear. > > As for why it happened I would first address the disk space issue and see if > the snapshot errors + open file descriptors issue still occurs (I am unclear > as to whether you got the snapshot exception after the disk filled up or > before), if you still have issues with repair not letting go of snapshotted > files even with free disk space I would look to raise a ticket in Jira. > > On 17 March 2015 at 12:46, David Wahler wrote: >> >> On Mon, Mar 16, 2015 at 6:51 PM, Ben Bromhead wrote: >> > If you are running a sequential repair (or have previously run a >> > sequential >> > repair that is still running) Cassandra will still have the file >> > descriptors >> > open for files in the snapshot it is using for the repair operation. >> >> Yeah, that aligns with my understanding of how the repair process >> works. But the cluster has no repair sessions active (I think; when I >> run "nodetool tpstats", the AntiEntropyStage and AntiEntropySessions >> values are zero on all nodes) and the space still hasn't been freed. > > > > > -- > > Ben Bromhead > > Instaclustr | www.instaclustr.com | @instaclustr | (650) 284 9692