Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: error (athena.apache.org: local policy)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Subject: Re: Sudden increase in diskspace usage
From: Nicolai Gylling <ng@issuu.com>
In-Reply-To: 
 <CAEDUwd1zPSCuZt_1LcxB8+c-AsVgJ7t3EbsJzP2SzoLxEY3BGg@mail.gmail.com>
Date: Fri, 10 May 2013 10:33:11 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <FDA4F9A3-FCFD-4874-A1EF-A6E00C73A9F8@issuu.com>
References: 
 <CAN=nf_RwR-w35ttvrAQYWguN=vqLZaNpX_o3bAizYpnVvOorqw@mail.gmail.com>
 <CAEDUwd1zPSCuZt_1LcxB8+c-AsVgJ7t3EbsJzP2SzoLxEY3BGg@mail.gmail.com>
To: user@cassandra.apache.org

> On Wed, May 8, 2013 at 10:43 PM, Nicolai Gylling <ng@issuu.com> wrote:
>> At the time of normal operation there was 800 gb free space on each =
node.
>> After the crash, C* started using a lot more, resulting in an
>> out-of-diskspace situation on 2 nodes, eg. C* used up the 800 gb in =
just 2
>> days, giving us very little time to do anything about it, since
>> repairs/joins takes a considerable amount of time.
>=20
> Did someone do a repair? Repair very frequently results in (usually
> temporary) >2x disk consumption.
>=20
Repairs is running regularly once a week, and normally doesn't take up =
much space, as we're using Leveled Compaction Strategy.=20


>> What can make C* suddenly use this amount of disk-space? We did see a =
lot of
>> pending compactions on one node (7k).
>=20
> Mostly repair.
>=20
>> Any tips on recovering from an out-of-diskspace on multiple nodes,
>> situation? I've tried moving some SStables away, but C* seems to use
>> whatever space I free up in no time. I'm not sure if any of the nodes =
is
>> fully updated as 'nodetool status' reports 3 different loads
>=20
> A relevant note here is that moving sstables out of the full partition
> while cassandra is running will not result in any space recovery,
> because Cassandra still has an open filehandle to that sstable. In
> order to deal with out of disk space condition you need to stop
> Cassandra. Unfortunately the JVM stops responding to clean shutdown
> request when the disk is full, you will have to kill -KILL the
> process.
>=20
> If you have a lot of overwrites/fragmentation, you could attempt to
> clear enough space to do a major compaction of remaining data, do that
> major compaction, split your One Huge sstable with the (experimental)
> sstable_split tool and then copy temporarily moved sstables back onto
> the node. You could also attempt to use user defined compaction (via
> JMX endpoint) to strategically compact such data. If you grep for
> compaction in your logs, do you see compactions resulting in smaller
> output file sizes? (compacted to X% of original messages)
>=20
> I agree with Alexis Rodriguez that Cassandra 1.2.0 is not a version
> anyone should run, it contains significant bugs.
>=20
> =3DRob

We're storing timeseries, so we don't have any overwrites and hardly any =
reduction in sizes during compaction. I'll try to upgrade and see if =
that can help get some diskspace back.

Let's say we're seing some bug in C*, and SSTables doesn't get deleted =
during compaction (which I guess is the only reason for this consumption =
of diskspace). Will C* 1.2.4 be able to fix this? Or would it be a =
better solution to replace one node at a time, so we're sure to only =
have the data, that C* knows about?