Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
Sender: scode@scode.org
In-Reply-To: <loom.20100818T171819-23@post.gmane.org>
References: <loom.20100707T174053-205@post.gmane.org>
	<AANLkTikdDP6PG54q9OGmDApbcXlz7jK0rv6vbCqReVU-@mail.gmail.com>
	<AANLkTinnqSUZ9aDos7Njxmwg3uxPW0qPgwluGny0rb44@mail.gmail.com>
	<loom.20100707T190446-834@post.gmane.org>
	<AANLkTikDtcMwRcgJiWHm_UFa2JyY-aqpyrOYmN8-4BXB@mail.gmail.com>
	<loom.20100707T201738-422@post.gmane.org>
	<AANLkTikhrPQA8QIqlBgVz5DgCrv0nxK-17U2XNrWJ4XB@mail.gmail.com>
	<loom.20100708T152704-980@post.gmane.org>
	<AANLkTilMqu8xE7WE-RfiH5gFnerJo0YlICAqj7yjRuVc@mail.gmail.com>
	<loom.20100723T173649-633@post.gmane.org>
	<AANLkTingHTXJCkVEU6Y8Sv58TTw_pz9YH=0pj1g_8rqT@mail.gmail.com>
	<AANLkTikG+fny5Vm+fmo2nN6Rhc4Wd=1Y1hm=34Ln7rfP@mail.gmail.com>
	<loom.20100727T193000-758@post.gmane.org>
	<AANLkTim0tAdxFsq-+Gmg0bQFG-M_yFdvVBTNUVzE_BMX@mail.gmail.com>
	<AANLkTinKDiE0oRZYw+MxT6o4XGiMVBQGaAiYX5aG8M9s@mail.gmail.com>
	<4C5C8370.7070107@digg.com>
	<loom.20100818T162023-340@post.gmane.org>
	<AANLkTi=t185-DnmjM+sjHnkXyJR4ymnSRVHEg=XC+snr@mail.gmail.com>
	<loom.20100818T171819-23@post.gmane.org>
Date: Wed, 18 Aug 2010 19:57:20 +0200
Message-ID: <AANLkTimB-8=cZ+imUemAz4d_8d8x3wReu+u4C+DNE4Zu@mail.gmail.com>
Subject: Re: Cassandra disk space utilization WAY higher than I would expect
From: Peter Schuller <peter.schuller@infidyne.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

> I actually have the log files from all 8 nodes if it helps to diagnose wh=
at
> activity was going on behind the scenes. =C2=A0I really need to understan=
d how this
> happened.

Without necessarily dumping all the information - approximately what
do they contain? Do they contain anything about compactions,
anti-compactions, streaming, etc?

With an idle node after taking writes, I *think* the only expected
disk I/O (once it has settled) would be a memtable flush triggered by
memtable_flush_after_mins, and possibly compactions resulting from
that (depending on how close one were to triggering compaction prior
to the memtable flush). Whatever is causing additional sstables to be
written, even if somehow triggered incorrectly, I'd hope that they
were logged still.

What about something like a gossiping issue with some kind of
disagreement about token space? But even then, why would nodes
spontaneously start pushing data - my understanding is that this is
only triggered by administrative operations right now, which seems
confirmed by:

   http://wiki.apache.org/cassandra/Streaming

Assuming the log files contain some kind of activity such as
compaction/streaming/etc; do they correlate well in time with each
other and/or something else?

--=20
/ Peter Schuller