Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of rcoli@digg.com designates
 209.85.215.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <loom.20100819T161509-41@post.gmane.org>
References: <loom.20100707T174053-205@post.gmane.org>
	<AANLkTikdDP6PG54q9OGmDApbcXlz7jK0rv6vbCqReVU-@mail.gmail.com>
	<AANLkTinnqSUZ9aDos7Njxmwg3uxPW0qPgwluGny0rb44@mail.gmail.com>
	<loom.20100707T190446-834@post.gmane.org>
	<AANLkTikDtcMwRcgJiWHm_UFa2JyY-aqpyrOYmN8-4BXB@mail.gmail.com>
	<loom.20100707T201738-422@post.gmane.org>
	<AANLkTikhrPQA8QIqlBgVz5DgCrv0nxK-17U2XNrWJ4XB@mail.gmail.com>
	<loom.20100708T152704-980@post.gmane.org>
	<AANLkTilMqu8xE7WE-RfiH5gFnerJo0YlICAqj7yjRuVc@mail.gmail.com>
	<loom.20100723T173649-633@post.gmane.org>
	<AANLkTingHTXJCkVEU6Y8Sv58TTw_pz9YH=0pj1g_8rqT@mail.gmail.com>
	<AANLkTikG+fny5Vm+fmo2nN6Rhc4Wd=1Y1hm=34Ln7rfP@mail.gmail.com>
	<loom.20100727T193000-758@post.gmane.org>
	<AANLkTim0tAdxFsq-+Gmg0bQFG-M_yFdvVBTNUVzE_BMX@mail.gmail.com>
	<AANLkTinKDiE0oRZYw+MxT6o4XGiMVBQGaAiYX5aG8M9s@mail.gmail.com>
	<4C5C8370.7070107@digg.com>
	<loom.20100818T162023-340@post.gmane.org>
	<AANLkTi=t185-DnmjM+sjHnkXyJR4ymnSRVHEg=XC+snr@mail.gmail.com>
	<AANLkTimB-8=cZ+imUemAz4d_8d8x3wReu+u4C+DNE4Zu@mail.gmail.com>
	<loom.20100819T161509-41@post.gmane.org>
Date: Thu, 19 Aug 2010 12:16:09 -0700
Message-ID: <AANLkTi=3dD6umfZQAu4H4PGLDbj5kc+mvUFwpOTA6KW4@mail.gmail.com>
Subject: Re: Cassandra disk space utilization WAY higher than I would expect
From: Robert Coli <rcoli@digg.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Thu, Aug 19, 2010 at 7:23 AM, Julie <julie.sugar@nextcentury.com> wrote:
> At this point, I logged in. =A0The data distribution on this node was 122=
GB. =A0I
> started performing a manual nodetool cleanup.

Check the size of the Hinted Handoff CF? If your nodes are flapping
under sustained write, they could be storing a non-trivial number of
hinted handoff rows? Probably not 5x usage though..

http://wiki.apache.org/cassandra/Operations
"
The reason why you run nodetool cleanup on all live nodes [after
replacing a node] is to remove old Hinted Handoff writes stored for
the dead node.
"

You could relatively quickly determine whether Hinted Handoff is
implicated by running your test with the feature turned off.

https://issues.apache.org/jira/browse/CASSANDRA-894

=3DRob