Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CADjpmEF6jbACr+Qc0z5siJm7OQcYH_1sWa8gqDb08Yx12tH0Hg@mail.gmail.com>
References: 
 <CADjpmEF6jbACr+Qc0z5siJm7OQcYH_1sWa8gqDb08Yx12tH0Hg@mail.gmail.com>
From: Alain RODRIGUEZ <arodrime@gmail.com>
Date: Thu, 14 Apr 2016 13:47:40 +0200
Message-ID: 
 <CA+VSrLrYNKwdk9ORD4U2YizR8az1P2Xjsx-Ogzy8tXwUrsGsVA@mail.gmail.com>
Subject: Re: Cassandra 2.1.12 Node size
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a1143a728571704053070774c

--001a1143a728571704053070774c
Content-Type: text/plain; charset=UTF-8

Hi,

I seek advice in data size per node. Each of my node has close to 1 TB of
> data. I am not seeing any issues as of now but wanted to run it by you guys
> if this data size is pushing the limits in any manner and if I should be
> working on reducing data size per node.


There is no real limit to the data size other than 50% of the machine disk
space using STCS and 80 % if you are using LCS. Those are 'soft' limits as
it will depend on your biggest sstables size and the number of concurrent
compactions mainly, but to stay away from trouble, it is better to keep
things under control, below the limits mentioned above.

I will me migrating to incremental repairs shortly and full repair as of
> now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>

As you noticed, you need to keep in mind that the larger the dataset is,
the longer operations will take. Repairs but also bootstrap or replace a
node, remove a node, any operation that require to stream data or read it.
Repair time can be mitigated by using incremental repairs indeed.

I am running a 9 node C* 2.1.12 cluster.
>

It should be quite safe to give incremental repair a try as many bugs have
been fixe in this version:

FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction
- incremental only

https://issues.apache.org/jira/browse/CASSANDRA-10422

FIX 2.1.12 - repair hang when replica is down - incremental only

https://issues.apache.org/jira/browse/CASSANDRA-10288

If you are using DTCS be aware of
https://issues.apache.org/jira/browse/CASSANDRA-11113

If using LCS, watch closely sstable and compactions pending counts.

As a general comment, I would say that Cassandra has evolved to be able to
handle huge datasets (memory structures off-heap + increase of heap size
using G1GC, JBOD, vnodes, ...). Today Cassandra works just fine with big
dataset. I have seen clusters with 4+ TB nodes and other using a few GB per
node. It all depends on your requirements and your machines spec. If fast
operations are absolutely necessary, keep it small. If you want to use the
entire disk space (50/80% of total disk space max), go ahead as long as
other resources are fine (CPU, memory, disk throughput, ...).

C*heers,

-----------------------
Alain Rodriguez - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-04-14 10:57 GMT+02:00 Aiman Parvaiz <aiman@flipagram.com>:

> Hi all,
> I am running a 9 node C* 2.1.12 cluster. I seek advice in data size per
> node. Each of my node has close to 1 TB of data. I am not seeing any issues
> as of now but wanted to run it by you guys if this data size is pushing the
> limits in any manner and if I should be working on reducing data size per
> node. I will me migrating to incremental repairs shortly and full repair as
> of now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>
> Thanks
>
>
>
>

--001a1143a728571704053070774c
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><font face=3D"arial, helvetica, sans-serif">Hi,</font><div=
><font face=3D"arial, helvetica, sans-serif"><br></font></div><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1p=
x;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1=
ex"><font face=3D"arial, helvetica, sans-serif">I seek advice in data size =
per node. Each of my node has close to 1 TB of data. I am not seeing any is=
sues as of now but wanted to run it by you guys if this data size is pushin=
g the limits in any manner and if I should be working on reducing data size=
 per node.</font></blockquote><div><font face=3D"arial, helvetica, sans-ser=
if"><br></font></div><div><font face=3D"arial, helvetica, sans-serif">There=
 is no real limit to the data size other than 50% of the machine disk space=
 using STCS and 80 % if you are using LCS. Those are &#39;soft&#39; limits =
as it will depend on your biggest sstables size and the number of concurren=
t compactions mainly, but to stay away from trouble, it is better to keep t=
hings under control, below the limits mentioned above.</font></div><div><fo=
nt face=3D"arial, helvetica, sans-serif"><br></font></div><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;bo=
rder-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">=
<font face=3D"arial, helvetica, sans-serif">I will me migrating to incremen=
tal repairs shortly and full repair as of now takes 20 hr/node. I am not se=
eing any issues with the nodes for now.<br></font></blockquote><div><font f=
ace=3D"arial, helvetica, sans-serif"><br></font></div><div><font face=3D"ar=
ial, helvetica, sans-serif">As you noticed, you need to keep in mind that t=
he larger the dataset is, the longer operations will take. R</font><span st=
yle=3D"font-family:arial,helvetica,sans-serif">epairs but also=C2=A0</span>=
<span style=3D"font-family:arial,helvetica,sans-serif">bootstrap or replace=
 a node, remove a node, any operation that require to stream data or read i=
t. Repair time can be mitigated by using incremental repairs indeed.=C2=A0<=
/span></div><div><font face=3D"arial, helvetica, sans-serif"><br></font></d=
iv><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bord=
er-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:soli=
d;padding-left:1ex"><font face=3D"arial, helvetica, sans-serif">I am runnin=
g a 9 node C* 2.1.12 cluster.<br></font></blockquote><div><font face=3D"ari=
al, helvetica, sans-serif"><br></font></div><div><font face=3D"arial, helve=
tica, sans-serif">It should be quite safe to give incremental repair a try =
as many bugs have been fixe in this version:</font></div><div><font face=3D=
"arial, helvetica, sans-serif"><br></font></div><div>


<p><span><font face=3D"arial, helvetica, sans-serif">FIX 2.1.12 - A lot of =
sstables using range repairs due to anticompaction -=C2=A0incremental only<=
/font></span></p>
<p><span><a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-10422" =
target=3D"_blank"><font face=3D"arial, helvetica, sans-serif">https://issue=
s.apache.org/jira/browse/CASSANDRA-10422</font></a></span></p>
<p><span><font face=3D"arial, helvetica, sans-serif">FIX 2.1.12 - repair ha=
ng when replica is down - incremental only</font></span></p>
<p><span><a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-10288" =
target=3D"_blank"><font face=3D"arial, helvetica, sans-serif">https://issue=
s.apache.org/jira/browse/CASSANDRA-10288</font></a></span></p><p><font face=
=3D"arial, helvetica, sans-serif">If you are using DTCS be aware of <a href=
=3D"https://issues.apache.org/jira/browse/CASSANDRA-11113" target=3D"_blank=
">https://issues.apache.org/jira/browse/CASSANDRA-11113</a></font></p><p><f=
ont face=3D"arial, helvetica, sans-serif">If using LCS, watch closely sstab=
le and compactions pending counts.</font></p><p><font face=3D"arial, helvet=
ica, sans-serif">As a general comment, I would say that Cassandra has evolv=
ed to be able to handle huge datasets (memory structures off-heap + increas=
e of heap size using G1GC, JBOD, vnodes, ...). Today Cassandra works just f=
ine with big dataset. I have seen clusters with 4+ TB nodes and other using=
 a few GB per node. It all depends on your requirements and your machines s=
pec. If fast operations are absolutely necessary, keep it small. If you wan=
t to use the entire disk space (50/80% of total disk space max), go ahead a=
s long as other resources are fine (CPU, memory, disk throughput, ...).</fo=
nt></p><p><font face=3D"arial, helvetica, sans-serif">C*heers,</font></p><p=
><font face=3D"arial, helvetica, sans-serif">-----------------------<br></f=
ont><span style=3D"font-family:arial,helvetica,sans-serif">Alain Rodriguez =
- <a href=3D"mailto:alain@thelastpickle.com" target=3D"_blank">alain@thelas=
tpickle.com</a><br></span><span style=3D"font-family:arial,helvetica,sans-s=
erif">France</span></p><p><font face=3D"arial, helvetica, sans-serif">The L=
ast Pickle - Apache Cassandra Consulting<br></font><span style=3D"font-fami=
ly:arial,helvetica,sans-serif"><a href=3D"http://www.thelastpickle.com" tar=
get=3D"_blank">http://www.thelastpickle.com</a></span></p>


</div><div class=3D"gmail_extra"><font face=3D"arial, helvetica, sans-serif=
"><br></font><div class=3D"gmail_quote"><font face=3D"arial, helvetica, san=
s-serif">2016-04-14 10:57 GMT+02:00 Aiman Parvaiz <span dir=3D"ltr">&lt;<a =
href=3D"mailto:aiman@flipagram.com" target=3D"_blank">aiman@flipagram.com</=
a>&gt;</span>:<br></font><blockquote class=3D"gmail_quote" style=3D"margin:=
0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);=
border-left-style:solid;padding-left:1ex"><div dir=3D"ltr"><font face=3D"ar=
ial, helvetica, sans-serif">Hi all,</font><div><font face=3D"arial, helveti=
ca, sans-serif">I am running a 9 node C* 2.1.12 cluster. I seek advice in d=
ata size per node. Each of my node has close to 1 TB of data. I am not seei=
ng any issues as of now but wanted to run it by you guys if this data size =
is pushing the limits in any manner and if I should be working on reducing =
data size per node. I will me migrating to incremental repairs shortly and =
full repair as of now takes 20 hr/node. I am not seeing any issues with the=
 nodes for now.</font></div><div><font face=3D"arial, helvetica, sans-serif=
"><br></font></div><div><font face=3D"arial, helvetica, sans-serif">Thanks<=
/font></div><div><font face=3D"arial, helvetica, sans-serif"><br clear=3D"a=
ll"></font><div><font face=3D"arial, helvetica, sans-serif"><br></font></di=
v><font face=3D"arial, helvetica, sans-serif"><br>
</font></div></div>
</blockquote></div><br></div></div>

--001a1143a728571704053070774c--