Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of bench@instructure.com
 designates 209.85.210.50 as permitted sender)
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Apple Message framework v1085)
Subject: Re: lots of extra bytes on disk
From: Ben Chobot <bench@instructure.com>
In-Reply-To: <CD79DFFE.2468F%Dean.Hiller@nrel.gov>
Date: Thu, 28 Mar 2013 11:03:04 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <7429167F-64CB-45D8-B6F9-7B843C44FA77@instructure.com>
References: <CD79DFFE.2468F%Dean.Hiller@nrel.gov>
To: user@cassandra.apache.org

Actually, due to a misconfiguration, we weren't snapshotting at all on =
some of the nodes that are experiencing this problem. So while we've =
fixed that, snapshot don't explain the problem.

On Mar 28, 2013, at 10:54 AM, Hiller, Dean wrote:

> Have you cleaned up your snapshots=8Athose take extra space and don't =
just
> go away unless you delete them.
>=20
> Dean
>=20
> On 3/28/13 11:46 AM, "Ben Chobot" <bench@instructure.com> wrote:
>=20
>> Are you also running 1.1.5? I'm wondering (ok hoping) that this might =
be
>> fixed if I upgrade.
>>=20
>> On Mar 28, 2013, at 8:53 AM, Lanny Ripple wrote:
>>=20
>>> We occasionally (twice now on a 40 node cluster over the last 6-8
>>> months) see this.  My best guess is that Cassandra can fail to mark =
an
>>> SSTable for cleanup somehow.  Forced GC's or reboots don't clear =
them
>>> out.  We disable thrift and gossip; drain; snapshot; shutdown; clear
>>> data/Keyspace/Table/*.db and restore (hard-linking back into place =
to
>>> avoid data transfer) from the just created snapshot; restart.
>>>=20
>>>=20
>>> On Mar 28, 2013, at 10:12 AM, Ben Chobot <bench@instructure.com> =
wrote:
>>>=20
>>>> Some of my cassandra nodes in my 1.1.5 cluster show a large
>>>> discrepancy between what cassandra says the SSTables should sum up =
to,
>>>> and what df and du claim exist. During repairs, this is almost =
always
>>>> pretty bad, but post-repair compactions tend to bring those numbers =
to
>>>> within a few percent of each other... usually. Sometimes they =
remain
>>>> much further apart after compactions have finished - for instance, =
I'm
>>>> looking at one node now that claims to have 205GB of SSTables, but
>>>> actually has 450GB of files living in that CF's data directory. No
>>>> pending compactions, and the most recent compaction for this CF
>>>> finished just a few hours ago.
>>>>=20
>>>> nodetool cleanup has no effect.
>>>>=20
>>>> What could be causing these extra bytes, and how to get them to go
>>>> away? I'm ok with a few extra GB of unexplained data, but an extra
>>>> 245GB (more than all the data this node is supposed to have!) is a
>>>> little extreme.
>>>=20
>>=20
>=20