Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of sylvain@datastax.com
 designates 209.85.218.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <4D7D0885.4070300@hiramoto.org>
References: <4D7D0885.4070300@hiramoto.org>
Date: Mon, 14 Mar 2011 15:33:51 +0100
Message-ID: <AANLkTi=kznZSsH+C795ffR=xX_zAs8-kS+-sEdC+6_Fw@mail.gmail.com>
Subject: Re: reducing disk usage advice
From: Sylvain Lebresne <sylvain@datastax.com>
To: user@cassandra.apache.org
Cc: Karl Hiramoto <karl@hiramoto.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Sun, Mar 13, 2011 at 7:10 PM, Karl Hiramoto <karl@hiramoto.org> wrote:
>
> Hi,
>
> I'm looking for advice on reducing disk usage.=A0=A0 I've ran out of disk=
 space two days in a row while running a=A0 nightly scheduled nodetool repa=
ir && nodetool compact=A0 cronjob.
>
> I have 6 nodes RF=3D3=A0 each with 300 GB drives at a hosting company.=A0=
=A0 GCGraceSeconds =3D 260000 (3.1 days)
>
> Every column in the database has a TTL of 86400 (24 hours)=A0=A0 to handl=
e deletion of stale data.=A0=A0 50% of the time the data is only written on=
ce, read 0 or many times then expires. The other 50% of the time it's writt=
en multiple times, resetting the TTL to 24 hours each time.

As it turns out, the compaction algorithm=A0is pretty much the worst
possible for this use case. Because we compact files that have a
similar size, the older a column gets, the less often it is compacted.
If you always set a fixed TTL for all columns, you would want to do
some compaction of recent sstable, for the sake of not having too many
sstables, but you also want to compact old sstable, that are
guaranteed to just go away. And for those, it's actually fine to
compact them alone (only for the sake of purging).
But as compaction works, you will end up with big sstables of stuffs
that are expired, and you may even not be able to compact simply
because compaction "thinks" it doesn't have enough room.

But I do think that your use case (having a CF where are columns have
the same TTL and you only rely on it for deletion) is a very useful
one, and we should handle it better. In particular, CASSANDRA-1610
could be an easy way to get this.

CASSANDRA-1537 is probably also a partial but possibly sufficient
solution. That's also probably easier than CASSANDRA-1610 and I'll try
to give it a shot asap, that had been on my todo list way too long.

> One question,=A0 since I use a TTL is it safe to set GCGraceSeconds=A0 to=
 0?=A0=A0 I don't manually delete ever, I just rely on the TTL for deletion=
, so are forgotten deletes an issue?

The rule is this. Say you think that m is a reasonable value for
GCGraceSeconds. That is, you make sure that you'll always put back up
failing nodes and run repair within m seconds. Then, if you always use
a TTL of n (in your case 24 hours), the actual GCGraceSeconds that you
should set is m - n.

So putting a GCGrace of 0 in you would would be roughly equivalent to
set a GCGrace of 24h on a "normal" CF. That's probably a bit low.

--
Sylvain


>
>
>
> cfstats:
> =A0Read Count: 32052
> =A0=A0=A0=A0=A0=A0=A0 Read Latency: 3.1280378135529765 ms.
> =A0=A0=A0=A0=A0=A0=A0 Write Count: 9704525
> =A0=A0=A0=A0=A0=A0=A0 Write Latency: 0.009527474760485443 ms.
> =A0=A0=A0=A0=A0=A0=A0 Pending Tasks: 0
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Column Family: Offer
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 SSTable count: 12
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Space used (live): 59865089=
091
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Space used (total): 7611157=
7830
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Memtable Columns Count: 393=
55
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Memtable Data Size: 1472631=
3
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Memtable Switch Count: 414
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Read Count: 32052
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Read Latency: 3.128 ms.
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Write Count: 9704525
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Write Latency: 0.010 ms.
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Pending Tasks: 0
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Key cache capacity: 1000
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Key cache size: 1000
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Key cache hit rate: 2.48059=
31214280473E-4
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Row cache: disabled
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Compacted row minimum size:=
 36
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Compacted row maximum size:=
 1597
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Compacted row mean size: 13=
19
>