Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EBCD6F491 for ; Thu, 28 Mar 2013 18:03:36 +0000 (UTC) Received: (qmail 86369 invoked by uid 500); 28 Mar 2013 18:03:34 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 86350 invoked by uid 500); 28 Mar 2013 18:03:34 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 86327 invoked by uid 99); 28 Mar 2013 18:03:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2013 18:03:34 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bench@instructure.com designates 209.85.210.50 as permitted sender) Received: from [209.85.210.50] (HELO mail-da0-f50.google.com) (209.85.210.50) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2013 18:03:29 +0000 Received: by mail-da0-f50.google.com with SMTP id t1so3064455dae.23 for ; Thu, 28 Mar 2013 11:03:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=vTMDABlrnQBOWM/mVf+tG4l4l5K4xe+xYq6t5Q+mYG4=; b=YpUNKchAWfG1rx1MoYnGSPCaSBK2ii6/ek/RQMoxk5CGbwj6xqWbhhwI+FL5qs1C+J Xq0T0b23HcAxLZ1IHmlNbvAMTlcCdHdNfmXEaMcSP1ahCdN9Xri2SLIqImtawTyMEQ0G lVBSOzTj+mJTYObLhjFBH3jUYUarGnGn28LmikZw7R+HsGcHEJf2ppPmlyRGXI1WSfRV 1ZJtymrycI7dhQ3ACpBpc8JfORPGqB79Y9/Eb/TttnnBhLQk+qQa+zMFMJ0+nTgIVWJl DbZAXtJS7QbhNaIhaMbzMjPQT3tElAhvtIOB3/ZGbnzB1PdjwhBe6LAB3D2J5n7+u1Wv 1WWA== X-Received: by 10.66.230.198 with SMTP id ta6mr94834pac.126.1364493788840; Thu, 28 Mar 2013 11:03:08 -0700 (PDT) Received: from [10.2.3.2] (red.silentmedia.com. [70.90.189.53]) by mx.google.com with ESMTPS id oq3sm108238pac.16.2013.03.28.11.03.06 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 28 Mar 2013 11:03:07 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Apple Message framework v1085) Subject: Re: lots of extra bytes on disk From: Ben Chobot In-Reply-To: Date: Thu, 28 Mar 2013 11:03:04 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <7429167F-64CB-45D8-B6F9-7B843C44FA77@instructure.com> References: To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQkJrl0c/7FHi1ZRUT+mV1Chjh6FF9RMWi3FBaumMdZst6kz5ck34DLD4+l4hmO7Mmdfj2mS X-Virus-Checked: Checked by ClamAV on apache.org Actually, due to a misconfiguration, we weren't snapshotting at all on = some of the nodes that are experiencing this problem. So while we've = fixed that, snapshot don't explain the problem. On Mar 28, 2013, at 10:54 AM, Hiller, Dean wrote: > Have you cleaned up your snapshots=8Athose take extra space and don't = just > go away unless you delete them. >=20 > Dean >=20 > On 3/28/13 11:46 AM, "Ben Chobot" wrote: >=20 >> Are you also running 1.1.5? I'm wondering (ok hoping) that this might = be >> fixed if I upgrade. >>=20 >> On Mar 28, 2013, at 8:53 AM, Lanny Ripple wrote: >>=20 >>> We occasionally (twice now on a 40 node cluster over the last 6-8 >>> months) see this. My best guess is that Cassandra can fail to mark = an >>> SSTable for cleanup somehow. Forced GC's or reboots don't clear = them >>> out. We disable thrift and gossip; drain; snapshot; shutdown; clear >>> data/Keyspace/Table/*.db and restore (hard-linking back into place = to >>> avoid data transfer) from the just created snapshot; restart. >>>=20 >>>=20 >>> On Mar 28, 2013, at 10:12 AM, Ben Chobot = wrote: >>>=20 >>>> Some of my cassandra nodes in my 1.1.5 cluster show a large >>>> discrepancy between what cassandra says the SSTables should sum up = to, >>>> and what df and du claim exist. During repairs, this is almost = always >>>> pretty bad, but post-repair compactions tend to bring those numbers = to >>>> within a few percent of each other... usually. Sometimes they = remain >>>> much further apart after compactions have finished - for instance, = I'm >>>> looking at one node now that claims to have 205GB of SSTables, but >>>> actually has 450GB of files living in that CF's data directory. No >>>> pending compactions, and the most recent compaction for this CF >>>> finished just a few hours ago. >>>>=20 >>>> nodetool cleanup has no effect. >>>>=20 >>>> What could be causing these extra bytes, and how to get them to go >>>> away? I'm ok with a few extra GB of unexplained data, but an extra >>>> 245GB (more than all the data this node is supposed to have!) is a >>>> little extreme. >>>=20 >>=20 >=20