incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: lots of extra bytes on disk
Date Thu, 28 Mar 2013 18:40:21 GMT
Oh and since our LCS was 10MB per file it was easy to tell which files did
not convert yet.  Also, we ended up blowing away a CF on node 5(of 6) and
running a full repair on that CF and after he was at a normal size again
as well.

Dean

On 3/28/13 12:35 PM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:

>We had a runaway STCS like this due to our own mistakes but were not sure
>how to clean it up.  We went to LCS instead of STCS and that seemed to
>bring it way back down since the STCS had repeats and such between
>SSTables which LCS avoids mostly.  I can't help much more than that info
>though.
>
>Dean
>
>On 3/28/13 12:31 PM, "Ben Chobot" <bench@instructure.com> wrote:
>
>>Sorry to make it confusing. I didn't have snapshots on some nodes; I just
>>made a snapshot on a node with this problem.
>>
>>So to be clear, on this one example node....
>> Cassandra reports ~250GB of space used
>> In a CF data directory (before snapshots existed), du -sh showed ~550GB
>> After the snapshot, du in the same directory still showed ~550GB
>>(they're hard links, so that's correct)
>> du in the snapshot directory for that CF shows ~250GB, and ls shows ~50
>>fewer files.
>>
>>
>>
>>On Mar 28, 2013, at 11:10 AM, Hiller, Dean wrote:
>>
>>> I am confused.  I thought you said you don't have a snapshot.  Df/du
>>> reports space used by existing data AND the snapshot.  Cassandra only
>>> reports on space used by actual data........if you move the snapshots,
>>>does
>>> df/du match what cassandra says?
>>> 
>>> Dean
>>> 
>>> On 3/28/13 12:05 PM, "Ben Chobot" <bench@instructure.com> wrote:
>>> 
>>>> .....though interestingly, the snapshot of these CFs have the "right"
>>>> amount of data in them (i.e. it agrees with the live SSTable size
>>>> reported by cassandra). Is it total insanity to remove the files from
>>>>the
>>>> data directory not included in the snapshot, so long as they were
>>>>created
>>>> before the snapshot?
>>>> 
>>>> On Mar 28, 2013, at 10:54 AM, Hiller, Dean wrote:
>>>> 
>>>>> Have you cleaned up your snapshotsÅ those take extra space and don't
>>>>>just
>>>>> go away unless you delete them.
>>>>> 
>>>>> Dean
>>>>> 
>>>>> On 3/28/13 11:46 AM, "Ben Chobot" <bench@instructure.com> wrote:
>>>>> 
>>>>>> Are you also running 1.1.5? I'm wondering (ok hoping) that this
>>>>>>might
>>>>>> be
>>>>>> fixed if I upgrade.
>>>>>> 
>>>>>> On Mar 28, 2013, at 8:53 AM, Lanny Ripple wrote:
>>>>>> 
>>>>>>> We occasionally (twice now on a 40 node cluster over the last
6-8
>>>>>>> months) see this.  My best guess is that Cassandra can fail to
mark
>>>>>>>an
>>>>>>> SSTable for cleanup somehow.  Forced GC's or reboots don't clear
>>>>>>>them
>>>>>>> out.  We disable thrift and gossip; drain; snapshot; shutdown;
>>>>>>>clear
>>>>>>> data/Keyspace/Table/*.db and restore (hard-linking back into
place
>>>>>>>to
>>>>>>> avoid data transfer) from the just created snapshot; restart.
>>>>>>> 
>>>>>>> 
>>>>>>> On Mar 28, 2013, at 10:12 AM, Ben Chobot <bench@instructure.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Some of my cassandra nodes in my 1.1.5 cluster show a large
>>>>>>>> discrepancy between what cassandra says the SSTables should
sum up
>>>>>>>> to,
>>>>>>>> and what df and du claim exist. During repairs, this is almost
>>>>>>>>always
>>>>>>>> pretty bad, but post-repair compactions tend to bring those
>>>>>>>>numbers
>>>>>>>> to
>>>>>>>> within a few percent of each other... usually. Sometimes
they
>>>>>>>>remain
>>>>>>>> much further apart after compactions have finished - for
instance,
>>>>>>>> I'm
>>>>>>>> looking at one node now that claims to have 205GB of SSTables,
but
>>>>>>>> actually has 450GB of files living in that CF's data directory.
No
>>>>>>>> pending compactions, and the most recent compaction for this
CF
>>>>>>>> finished just a few hours ago.
>>>>>>>> 
>>>>>>>> nodetool cleanup has no effect.
>>>>>>>> 
>>>>>>>> What could be causing these extra bytes, and how to get them
to go
>>>>>>>> away? I'm ok with a few extra GB of unexplained data, but
an extra
>>>>>>>> 245GB (more than all the data this node is supposed to have!)
is a
>>>>>>>> little extreme.
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>
>


Mime
View raw message