Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 64538 invoked from network); 18 Aug 2010 17:57:56 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Aug 2010 17:57:56 -0000 Received: (qmail 48142 invoked by uid 500); 18 Aug 2010 17:57:54 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 48102 invoked by uid 500); 18 Aug 2010 17:57:54 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 48094 invoked by uid 99); 18 Aug 2010 17:57:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Aug 2010 17:57:54 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.212.44] (HELO mail-vw0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Aug 2010 17:57:46 +0000 Received: by vws10 with SMTP id 10so872053vws.31 for ; Wed, 18 Aug 2010 10:57:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.49.212 with SMTP id w20mr5216832vcf.246.1282154241129; Wed, 18 Aug 2010 10:57:21 -0700 (PDT) Sender: scode@scode.org Received: by 10.220.185.202 with HTTP; Wed, 18 Aug 2010 10:57:20 -0700 (PDT) X-Originating-IP: [213.114.156.79] In-Reply-To: References: <4C5C8370.7070107@digg.com> Date: Wed, 18 Aug 2010 19:57:20 +0200 X-Google-Sender-Auth: mx0ur6FzjaRce4SAXnxb4gVcsug Message-ID: Subject: Re: Cassandra disk space utilization WAY higher than I would expect From: Peter Schuller To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable > I actually have the log files from all 8 nodes if it helps to diagnose wh= at > activity was going on behind the scenes. =C2=A0I really need to understan= d how this > happened. Without necessarily dumping all the information - approximately what do they contain? Do they contain anything about compactions, anti-compactions, streaming, etc? With an idle node after taking writes, I *think* the only expected disk I/O (once it has settled) would be a memtable flush triggered by memtable_flush_after_mins, and possibly compactions resulting from that (depending on how close one were to triggering compaction prior to the memtable flush). Whatever is causing additional sstables to be written, even if somehow triggered incorrectly, I'd hope that they were logged still. What about something like a gossiping issue with some kind of disagreement about token space? But even then, why would nodes spontaneously start pushing data - my understanding is that this is only triggered by administrative operations right now, which seems confirmed by: http://wiki.apache.org/cassandra/Streaming Assuming the log files contain some kind of activity such as compaction/streaming/etc; do they correlate well in time with each other and/or something else? --=20 / Peter Schuller