cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aiman Parvaiz <ai...@flipagram.com>
Subject Re: Cassandra 2.1.12 Node size
Date Thu, 14 Apr 2016 13:41:48 GMT
Thanks for the response Alain. I am using STCS and would like to take some action as we would
be hitting 50% disk space pretty soon. Would adding nodes be the right way to start if I want
to get the data per node down otherwise can you or someone on the list please suggest the
right way to go about it.

Thanks

Sent from my iPhone

> On Apr 14, 2016, at 5:17 PM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:
> 
> Hi,
> 
>> I seek advice in data size per node. Each of my node has close to 1 TB of data. I
am not seeing any issues as of now but wanted to run it by you guys if this data size is pushing
the limits in any manner and if I should be working on reducing data size per node.
> 
> There is no real limit to the data size other than 50% of the machine disk space using
STCS and 80 % if you are using LCS. Those are 'soft' limits as it will depend on your biggest
sstables size and the number of concurrent compactions mainly, but to stay away from trouble,
it is better to keep things under control, below the limits mentioned above.
> 
>> I will me migrating to incremental repairs shortly and full repair as of now takes
20 hr/node. I am not seeing any issues with the nodes for now.
> 
> As you noticed, you need to keep in mind that the larger the dataset is, the longer operations
will take. Repairs but also bootstrap or replace a node, remove a node, any operation that
require to stream data or read it. Repair time can be mitigated by using incremental repairs
indeed. 
> 
>> I am running a 9 node C* 2.1.12 cluster.
> 
> It should be quite safe to give incremental repair a try as many bugs have been fixe
in this version:
> 
> FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction - incremental
only
> 
> https://issues.apache.org/jira/browse/CASSANDRA-10422
> 
> FIX 2.1.12 - repair hang when replica is down - incremental only
> 
> https://issues.apache.org/jira/browse/CASSANDRA-10288
> 
> If you are using DTCS be aware of https://issues.apache.org/jira/browse/CASSANDRA-11113
> 
> If using LCS, watch closely sstable and compactions pending counts.
> 
> As a general comment, I would say that Cassandra has evolved to be able to handle huge
datasets (memory structures off-heap + increase of heap size using G1GC, JBOD, vnodes, ...).
Today Cassandra works just fine with big dataset. I have seen clusters with 4+ TB nodes and
other using a few GB per node. It all depends on your requirements and your machines spec.
If fast operations are absolutely necessary, keep it small. If you want to use the entire
disk space (50/80% of total disk space max), go ahead as long as other resources are fine
(CPU, memory, disk throughput, ...).
> 
> C*heers,
> 
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com
> France
> 
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> 
> 2016-04-14 10:57 GMT+02:00 Aiman Parvaiz <aiman@flipagram.com>:
>> Hi all,
>> I am running a 9 node C* 2.1.12 cluster. I seek advice in data size per node. Each
of my node has close to 1 TB of data. I am not seeing any issues as of now but wanted to run
it by you guys if this data size is pushing the limits in any manner and if I should be working
on reducing data size per node. I will me migrating to incremental repairs shortly and full
repair as of now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>> 
>> Thanks
> 

Mime
View raw message