Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 65089 invoked from network); 19 Aug 2010 19:16:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 Aug 2010 19:16:39 -0000 Received: (qmail 25274 invoked by uid 500); 19 Aug 2010 19:16:37 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 25216 invoked by uid 500); 19 Aug 2010 19:16:37 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 25208 invoked by uid 99); 19 Aug 2010 19:16:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Aug 2010 19:16:37 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rcoli@digg.com designates 209.85.215.172 as permitted sender) Received: from [209.85.215.172] (HELO mail-ey0-f172.google.com) (209.85.215.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Aug 2010 19:16:31 +0000 Received: by eyd10 with SMTP id 10so1680684eyd.31 for ; Thu, 19 Aug 2010 12:16:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.26.139 with SMTP id c11mr1069762wea.6.1282245369954; Thu, 19 Aug 2010 12:16:09 -0700 (PDT) Received: by 10.216.231.213 with HTTP; Thu, 19 Aug 2010 12:16:09 -0700 (PDT) In-Reply-To: References: <4C5C8370.7070107@digg.com> Date: Thu, 19 Aug 2010 12:16:09 -0700 Message-ID: Subject: Re: Cassandra disk space utilization WAY higher than I would expect From: Robert Coli To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Thu, Aug 19, 2010 at 7:23 AM, Julie wrote: > At this point, I logged in. =A0The data distribution on this node was 122= GB. =A0I > started performing a manual nodetool cleanup. Check the size of the Hinted Handoff CF? If your nodes are flapping under sustained write, they could be storing a non-trivial number of hinted handoff rows? Probably not 5x usage though.. http://wiki.apache.org/cassandra/Operations " The reason why you run nodetool cleanup on all live nodes [after replacing a node] is to remove old Hinted Handoff writes stored for the dead node. " You could relatively quickly determine whether Hinted Handoff is implicated by running your test with the feature turned off. https://issues.apache.org/jira/browse/CASSANDRA-894 =3DRob