incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolai Gylling ...@issuu.com>
Subject Re: Sudden increase in diskspace usage
Date Tue, 14 May 2013 09:39:10 GMT

On May 14, 2013, at 6:50 AM, aaron morton <aaron@thelastpickle.com> wrote:

>> Let's say we're seing some bug in C*, and SSTables doesn't get deleted during compaction
(which I guess is the only reason for this consumption of diskspace). 
> 
> Just out of interest can you check the number of SSTables reported by nodetool cfstats
for a CF against the number of *-Data.db files in the appropriate directory on disk?
> Another test is to take a snapshot and see if there are files in the live directory not
in the snapshot dir. 
> 
> Either of these techniques may identify SSTables on disk that the server is not tracking.

> 
> Cheers

Currently we see 9272 Data.db files, but only 8944 is reported by nodetool cfstats. However,
C* 1.2.4 seems to correct the problems, as it has recovered most of the used space. Still
waiting for the compactions to complete, though.

I'll check again once compaction is done.

>  
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 10/05/2013, at 8:33 PM, Nicolai Gylling <ng@issuu.com> wrote:
> 
>>> On Wed, May 8, 2013 at 10:43 PM, Nicolai Gylling <ng@issuu.com> wrote:
>>>> At the time of normal operation there was 800 gb free space on each node.
>>>> After the crash, C* started using a lot more, resulting in an
>>>> out-of-diskspace situation on 2 nodes, eg. C* used up the 800 gb in just
2
>>>> days, giving us very little time to do anything about it, since
>>>> repairs/joins takes a considerable amount of time.
>>> 
>>> Did someone do a repair? Repair very frequently results in (usually
>>> temporary) >2x disk consumption.
>>> 
>> Repairs is running regularly once a week, and normally doesn't take up much space,
as we're using Leveled Compaction Strategy. 
>> 
>> 
>>>> What can make C* suddenly use this amount of disk-space? We did see a lot
of
>>>> pending compactions on one node (7k).
>>> 
>>> Mostly repair.
>>> 
>>>> Any tips on recovering from an out-of-diskspace on multiple nodes,
>>>> situation? I've tried moving some SStables away, but C* seems to use
>>>> whatever space I free up in no time. I'm not sure if any of the nodes is
>>>> fully updated as 'nodetool status' reports 3 different loads
>>> 
>>> A relevant note here is that moving sstables out of the full partition
>>> while cassandra is running will not result in any space recovery,
>>> because Cassandra still has an open filehandle to that sstable. In
>>> order to deal with out of disk space condition you need to stop
>>> Cassandra. Unfortunately the JVM stops responding to clean shutdown
>>> request when the disk is full, you will have to kill -KILL the
>>> process.
>>> 
>>> If you have a lot of overwrites/fragmentation, you could attempt to
>>> clear enough space to do a major compaction of remaining data, do that
>>> major compaction, split your One Huge sstable with the (experimental)
>>> sstable_split tool and then copy temporarily moved sstables back onto
>>> the node. You could also attempt to use user defined compaction (via
>>> JMX endpoint) to strategically compact such data. If you grep for
>>> compaction in your logs, do you see compactions resulting in smaller
>>> output file sizes? (compacted to X% of original messages)
>>> 
>>> I agree with Alexis Rodriguez that Cassandra 1.2.0 is not a version
>>> anyone should run, it contains significant bugs.
>>> 
>>> =Rob
>> 
>> We're storing timeseries, so we don't have any overwrites and hardly any reduction
in sizes during compaction. I'll try to upgrade and see if that can help get some diskspace
back.
>> 
>> Let's say we're seing some bug in C*, and SSTables doesn't get deleted during compaction
(which I guess is the only reason for this consumption of diskspace). Will C* 1.2.4 be able
to fix this? Or would it be a better solution to replace one node at a time, so we're sure
to only have the data, that C* knows about?
>> 
>> 
> 


Mime
View raw message