cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Haddad <>
Subject Re: Cassandra Files Taking up Much More Space than CF
Date Tue, 09 Dec 2014 17:02:52 GMT
Well, I personally don't like RF=2.  It means if you're using CL=QUORUM and
a node goes down, you're going to have a bad time. (downtime) If you're
using CL=ONE then you'd be ok.  However, I am not wild about losing a node
and having only 1 copy of my data available in prod.

On Tue Dec 09 2014 at 8:40:37 AM Nate Yoder <> wrote:

> Thanks Jonathan.  So there is nothing too idiotic about my current set-up
> with 6 boxes each with 256 vnodes each and a RF of 2?
> I appreciate the help,
> Nate
> --
> *Nathanael Yoder*
> Principal Engineer & Data Scientist, Whistle
> 415-944-7344 //
> On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad <> wrote:
>> You don't need a prime number of nodes in your ring, but it's not a bad
>> idea to it be a multiple of your RF when your cluster is small.
>> On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder <> wrote:
>>> Hi Ian,
>>> Thanks for the suggestion but I had actually already done that prior to
>>> the scenario I described (to get myself some free space) and when I ran
>>> nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
>>> don't think that is where my space went.
>>> One additional piece of information I forgot to point out is that when I
>>> ran nodetool status on the node it included all 6 nodes.
>>> I have also heard it mentioned that I may want to have a prime number of
>>> nodes which may help protect against split-brain.  Is this true?  If so
>>> does it still apply when I am using vnodes?
>>> Thanks again,
>>> Nate
>>> --
>>> *Nathanael Yoder*
>>> Principal Engineer & Data Scientist, Whistle
>>> 415-944-7344 //
>>> On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose <> wrote:
>>>> Try `nodetool clearsnapshot` which will delete any snapshots you have.
>>>> I have never taken a snapshot with nodetool yet I found several snapshots
>>>> on my disk recently (which can take a lot of space).  So perhaps they are
>>>> automatically generated by some operation?  No idea.  Regardless, nuking
>>>> those freed up a ton of space for me.
>>>> - Ian
>>>> On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder <> wrote:
>>>>> Hi All,
>>>>> I am new to Cassandra so I apologise in advance if I have missed
>>>>> anything obvious but this one currently has me stumped.
>>>>> I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
>>>>> C3.2XLarge nodes which overall is working very well for us.  However,
>>>>> letting it run for a while I seem to get into a situation where the amount
>>>>> of disk space used far exceeds the total amount of data on each node
and I
>>>>> haven't been able to get the size to go back down except by stopping
>>>>> restarting the node.
>>>>> For example, in my data I have almost all of my data in one table.  On
>>>>> one of my nodes right now the total space used (as reported by nodetool
>>>>> cfstats) is 57.2 GB and there are no snapshots. However, when I look
at the
>>>>> size of the data files (using du) the data file for that table is 107GB.
>>>>> Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly
>>>>> becomes a problem.
>>>>> Running nodetool compact didn't reduce the size and neither does
>>>>> running nodetool repair -pr on the node.  I also tried nodetool flush
>>>>> nodetool cleanup (even though I have not added or removed any nodes
>>>>> recently) but it didn't change anything either.  In order to keep my
>>>>> cluster up I then stopped and started that node and the size of the data
>>>>> file dropped to 54GB while the total column family size (as reported
>>>>> nodetool) stayed about the same.
>>>>> Any suggestions as to what I could be doing wrong?
>>>>> Thanks,
>>>>> Nate

View raw message