incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Cassandra disk space utilization WAY higher than I would expect
Date Tue, 27 Jul 2010 12:58:39 GMT
On Fri, Jul 23, 2010 at 8:57 AM, Julie <> wrote:
> But in my focused testing today I see that if I run nodetool "cleanup" on the
> nodes taking up way more space than I expect, I see multiple SS Tables being
> combined into 1 or 2 and the live disk usage going way down, down to what I know
> the raw data requires.
> This is great news!  I haven't tested it on hugely bloated nodes yet (where the
> disk usage is 6X the size of the raw data) since I haven't reproduced that
> problem today, but I would think using nodetool "cleanup" will work.
> I just have two questions:
>       (1) How can I set up Cassandra to do this automatically, to allow my
> nodes to store more data?

You'd have to use cron or a similar external service.

>       (2) I am a bit confused why cleanup is working this way since the doc
> claims it just cleans up keys no longer belonging to this node.  I have 8 nodes
> and do a simple sequential write of 10,000 keys to each of them.  I'm using
> random partitioning and give each node an Initial Token that should force even
> spacing of tokens around the hash space:

a) cleanup is a superset of compaction, so if you've been doing
overwrites at all then it will reduce space used for that reason
b) if you have added, moved, or removed any nodes then you will have
"keys no longer belonging to this node"

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support

View raw message