cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nate Yoder <n...@whistle.com>
Subject Re: Cassandra Files Taking up Much More Space than CF
Date Tue, 09 Dec 2014 16:27:44 GMT
Hi Ian,

Thanks for the suggestion but I had actually already done that prior to the
scenario I described (to get myself some free space) and when I ran
nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
don't think that is where my space went.

One additional piece of information I forgot to point out is that when I
ran nodetool status on the node it included all 6 nodes.

I have also heard it mentioned that I may want to have a prime number of
nodes which may help protect against split-brain.  Is this true?  If so
does it still apply when I am using vnodes?

Thanks again,
Nate

--
*Nathanael Yoder*
Principal Engineer & Data Scientist, Whistle
415-944-7344 // nate@whistle.com

On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose <ianrose@fullstory.com> wrote:

> Try `nodetool clearsnapshot` which will delete any snapshots you have.  I
> have never taken a snapshot with nodetool yet I found several snapshots on
> my disk recently (which can take a lot of space).  So perhaps they are
> automatically generated by some operation?  No idea.  Regardless, nuking
> those freed up a ton of space for me.
>
> - Ian
>
>
> On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder <nate@whistle.com> wrote:
>
>> Hi All,
>>
>> I am new to Cassandra so I apologise in advance if I have missed anything
>> obvious but this one currently has me stumped.
>>
>> I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
>> C3.2XLarge nodes which overall is working very well for us.  However, after
>> letting it run for a while I seem to get into a situation where the amount
>> of disk space used far exceeds the total amount of data on each node and I
>> haven't been able to get the size to go back down except by stopping and
>> restarting the node.
>>
>> For example, in my data I have almost all of my data in one table.  On
>> one of my nodes right now the total space used (as reported by nodetool
>> cfstats) is 57.2 GB and there are no snapshots. However, when I look at the
>> size of the data files (using du) the data file for that table is 107GB.
>> Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly
>> becomes a problem.
>>
>> Running nodetool compact didn't reduce the size and neither does running
>> nodetool repair -pr on the node.  I also tried nodetool flush and nodetool
>> cleanup (even though I have not added or removed any nodes recently) but it
>> didn't change anything either.  In order to keep my cluster up I then
>> stopped and started that node and the size of the data file dropped to 54GB
>> while the total column family size (as reported by nodetool) stayed about
>> the same.
>>
>> Any suggestions as to what I could be doing wrong?
>>
>> Thanks,
>> Nate
>>
>
>

Mime
View raw message