Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: "Hiller, Dean" <Dean.Hiller@nrel.gov>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Thu, 28 Mar 2013 12:35:14 -0600
Subject: Re: lots of extra bytes on disk
Thread-Topic: lots of extra bytes on disk
Thread-Index: Ac4r4vXceZxjWNp9SDO1GRpAqpGGsg==
Message-ID: <CD79E93F.246AB%Dean.Hiller@nrel.gov>
In-Reply-To: <8803D27B-3A0F-45C2-852F-C759E5232273@instructure.com>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/14.3.2.130206
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-2"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

We had a runaway STCS like this due to our own mistakes but were not sure
how to clean it up.  We went to LCS instead of STCS and that seemed to
bring it way back down since the STCS had repeats and such between
SSTables which LCS avoids mostly.  I can't help much more than that info
though.

Dean

On 3/28/13 12:31 PM, "Ben Chobot" <bench@instructure.com> wrote:

>Sorry to make it confusing. I didn't have snapshots on some nodes; I just
>made a snapshot on a node with this problem.
>
>So to be clear, on this one example node....
> Cassandra reports ~250GB of space used
> In a CF data directory (before snapshots existed), du -sh showed ~550GB
> After the snapshot, du in the same directory still showed ~550GB
>(they're hard links, so that's correct)
> du in the snapshot directory for that CF shows ~250GB, and ls shows ~50
>fewer files.
>
>
>
>On Mar 28, 2013, at 11:10 AM, Hiller, Dean wrote:
>
>> I am confused.  I thought you said you don't have a snapshot.  Df/du
>> reports space used by existing data AND the snapshot.  Cassandra only
>> reports on space used by actual data........if you move the snapshots,
>>does
>> df/du match what cassandra says?
>>=20
>> Dean
>>=20
>> On 3/28/13 12:05 PM, "Ben Chobot" <bench@instructure.com> wrote:
>>=20
>>> .....though interestingly, the snapshot of these CFs have the "right"
>>> amount of data in them (i.e. it agrees with the live SSTable size
>>> reported by cassandra). Is it total insanity to remove the files from
>>>the
>>> data directory not included in the snapshot, so long as they were
>>>created
>>> before the snapshot?
>>>=20
>>> On Mar 28, 2013, at 10:54 AM, Hiller, Dean wrote:
>>>=20
>>>> Have you cleaned up your snapshots=A9those take extra space and don't
>>>>just
>>>> go away unless you delete them.
>>>>=20
>>>> Dean
>>>>=20
>>>> On 3/28/13 11:46 AM, "Ben Chobot" <bench@instructure.com> wrote:
>>>>=20
>>>>> Are you also running 1.1.5? I'm wondering (ok hoping) that this might
>>>>> be
>>>>> fixed if I upgrade.
>>>>>=20
>>>>> On Mar 28, 2013, at 8:53 AM, Lanny Ripple wrote:
>>>>>=20
>>>>>> We occasionally (twice now on a 40 node cluster over the last 6-8
>>>>>> months) see this.  My best guess is that Cassandra can fail to mark
>>>>>>an
>>>>>> SSTable for cleanup somehow.  Forced GC's or reboots don't clear
>>>>>>them
>>>>>> out.  We disable thrift and gossip; drain; snapshot; shutdown; clear
>>>>>> data/Keyspace/Table/*.db and restore (hard-linking back into place
>>>>>>to
>>>>>> avoid data transfer) from the just created snapshot; restart.
>>>>>>=20
>>>>>>=20
>>>>>> On Mar 28, 2013, at 10:12 AM, Ben Chobot <bench@instructure.com>
>>>>>> wrote:
>>>>>>=20
>>>>>>> Some of my cassandra nodes in my 1.1.5 cluster show a large
>>>>>>> discrepancy between what cassandra says the SSTables should sum up
>>>>>>> to,
>>>>>>> and what df and du claim exist. During repairs, this is almost
>>>>>>>always
>>>>>>> pretty bad, but post-repair compactions tend to bring those numbers
>>>>>>> to
>>>>>>> within a few percent of each other... usually. Sometimes they
>>>>>>>remain
>>>>>>> much further apart after compactions have finished - for instance,
>>>>>>> I'm
>>>>>>> looking at one node now that claims to have 205GB of SSTables, but
>>>>>>> actually has 450GB of files living in that CF's data directory. No
>>>>>>> pending compactions, and the most recent compaction for this CF
>>>>>>> finished just a few hours ago.
>>>>>>>=20
>>>>>>> nodetool cleanup has no effect.
>>>>>>>=20
>>>>>>> What could be causing these extra bytes, and how to get them to go
>>>>>>> away? I'm ok with a few extra GB of unexplained data, but an extra
>>>>>>> 245GB (more than all the data this node is supposed to have!) is a
>>>>>>> little extreme.
>>>>>>=20
>>>>>=20
>>>>=20
>>>=20
>>=20
>