Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of comomore@gmail.com designates
 209.85.220.173 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <524D60F4.4030409@opera.com>
References: 
 <CAP7WDFWBZwP70N4JRVVum-5ehUvxEjTYJp9-QYeC=EUWZwtHtQ@mail.gmail.com>
	<CAOT3TWrPHYdjAZ_esHJiqwK7TEw4yyMtWQjSFtAkxBKMM6KSHA@mail.gmail.com>
	<CAP7WDFXpO-yCkv1+w60PpLLqanxFzcv+mSH-t0sOaNme=PK18Q@mail.gmail.com>
	<CAJV_UYcGFQM_TUbdHef0v6HGfe5pJ6eH-fHPtY3oUJ-TebiCDw@mail.gmail.com>
	<CAP7WDFXo7DhdgLH77MGDcCoKkMvPZBWrTAeon8uysY2KdsFYhA@mail.gmail.com>
	<524D60F4.4030409@opera.com>
Date: Thu, 3 Oct 2013 08:02:35 -0500
Message-ID: 
 <CAP7WDFXyTiCrqk27KTMXPQ70hOU7s9bBP1qg=3jM3MUbCB_qOg@mail.gmail.com>
Subject: Re: Cassandra Heap Size for data more than 1 TB
From: srmore <comomore@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=bcaec5016625b7e49504e7d5cc96

--bcaec5016625b7e49504e7d5cc96
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Thanks Mohit and Michael,
That's what I thought. I have tried all the avenues, will give ParNew a
try. With the 1.0.xx I have issues when data sizes go up, hopefully that
will not be the case with 1.2.

Just curious, has anyone tried 1.2 with large data set, around 1 TB ?


Thanks !


On Thu, Oct 3, 2013 at 7:20 AM, Micha=C5=82 Michalski <michalm@opera.com> w=
rote:

> I was experimenting with 128 vs. 512 some time ago and I was unable to se=
e
> any difference in terms of performance. I'd probably check 1024 too, but =
we
> migrated to 1.2 and heap space was not an issue anymore.
>
> M.
>
> W dniu 02.10.2013 16:32, srmore pisze:
>
>  I changed my index_interval from 128 to index_interval: 128 to 512, does
>> it
>> make sense to increase more than this ?
>>
>>
>> On Wed, Oct 2, 2013 at 9:30 AM, cem <cayiroglu@gmail.com> wrote:
>>
>>  Have a look to index_interval.
>>>
>>> Cem.
>>>
>>>
>>> On Wed, Oct 2, 2013 at 2:25 PM, srmore <comomore@gmail.com> wrote:
>>>
>>>  The version of Cassandra I am using is 1.0.11, we are migrating to 1.2=
.X
>>>> though. We had tuned bloom filters (0.1) and AFAIK making it lower tha=
n
>>>> this won't matter.
>>>>
>>>> Thanks !
>>>>
>>>>
>>>> On Tue, Oct 1, 2013 at 11:54 PM, Mohit Anchlia <mohitanchlia@gmail.com
>>>> >wrote:
>>>>
>>>>  Which Cassandra version are you on? Essentially heap size is function
>>>>> of
>>>>> number of keys/metadata. In Cassandra 1.2 lot of the metadata like
>>>>> bloom
>>>>> filters were moved off heap.
>>>>>
>>>>>
>>>>> On Tue, Oct 1, 2013 at 9:34 PM, srmore <comomore@gmail.com> wrote:
>>>>>
>>>>>  Does anyone know what would roughly be the heap size for cassandra
>>>>>> with
>>>>>> 1TB of data ? We started with about 200 G and now on one of the node=
s
>>>>>> we
>>>>>> are already on 1 TB. We were using 8G of heap and that served us wel=
l
>>>>>> up
>>>>>> until we reached 700 G where we started seeing failures and nodes
>>>>>> flipping.
>>>>>>
>>>>>> With 1 TB of data the node refuses to come back due to lack of memor=
y.
>>>>>> needless to say repairs and compactions takes a lot of time. We uppe=
d
>>>>>> the
>>>>>> heap from 8 G to 12 G and suddenly everything started moving rapidly
>>>>>> i.e.
>>>>>> the repair tasks and the compaction tasks. But soon (in about 9-10
>>>>>> hrs) we
>>>>>> started seeing the same symptoms as we were seeing with 8 G.
>>>>>>
>>>>>> So my question is how do I determine what is the optimal size of hea=
p
>>>>>> for data around 1 TB ?
>>>>>>
>>>>>> Following are some of my JVM settings
>>>>>>
>>>>>> -Xms8G
>>>>>> -Xmx8G
>>>>>> -Xmn800m
>>>>>> -XX:NewSize=3D1200M
>>>>>> XX:MaxTenuringThreshold=3D2
>>>>>> -XX:SurvivorRatio=3D4
>>>>>>
>>>>>> Thanks !
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

--bcaec5016625b7e49504e7d5cc96
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div>Thanks Mohit and Michael,<br>That&#39;s what I t=
hought. I have tried all the avenues, will give ParNew a try. With the 1.0.=
xx I have issues when data sizes go up, hopefully that will not be the case=
 with 1.2.<br>
<br></div>Just curious, has anyone tried 1.2 with large data set, around 1 =
TB ? <br><br><br></div>Thanks !<br></div><div class=3D"gmail_extra"><br><br=
><div class=3D"gmail_quote">On Thu, Oct 3, 2013 at 7:20 AM, Micha=C5=82 Mic=
halski <span dir=3D"ltr">&lt;<a href=3D"mailto:michalm@opera.com" target=3D=
"_blank">michalm@opera.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">I was experimenting with 128 vs. 512 some ti=
me ago and I was unable to see any difference in terms of performance. I=
9;d probably check 1024 too, but we migrated to 1.2 and heap space was not =
an issue anymore.<br>

<br>
M.<br>
<br>
W dniu 02.10.2013 16:32, srmore pisze:<div class=3D"HOEnZb"><div class=3D"h=
5"><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
I changed my index_interval from 128 to index_interval: 128 to 512, does it=
<br>
make sense to increase more than this ?<br>
<br>
<br>
On Wed, Oct 2, 2013 at 9:30 AM, cem &lt;<a href=3D"mailto:cayiroglu@gmail.c=
om" target=3D"_blank">cayiroglu@gmail.com</a>&gt; wrote:<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Have a look to index_interval.<br>
<br>
Cem.<br>
<br>
<br>
On Wed, Oct 2, 2013 at 2:25 PM, srmore &lt;<a href=3D"mailto:comomore@gmail=
.com" target=3D"_blank">comomore@gmail.com</a>&gt; wrote:<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
The version of Cassandra I am using is 1.0.11, we are migrating to 1.2.X<br=
>
though. We had tuned bloom filters (0.1) and AFAIK making it lower than<br>
this won&#39;t matter.<br>
<br>
Thanks !<br>
<br>
<br>
On Tue, Oct 1, 2013 at 11:54 PM, Mohit Anchlia &lt;<a href=3D"mailto:mohita=
nchlia@gmail.com" target=3D"_blank">mohitanchlia@gmail.com</a>&gt;wrote:<br=
>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Which Cassandra version are you on? Essentially heap size is function of<br=
>
number of keys/metadata. In Cassandra 1.2 lot of the metadata like bloom<br=
>
filters were moved off heap.<br>
<br>
<br>
On Tue, Oct 1, 2013 at 9:34 PM, srmore &lt;<a href=3D"mailto:comomore@gmail=
.com" target=3D"_blank">comomore@gmail.com</a>&gt; wrote:<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Does anyone know what would roughly be the heap size for cassandra with<br>
1TB of data ? We started with about 200 G and now on one of the nodes we<br=
>
are already on 1 TB. We were using 8G of heap and that served us well up<br=
>
until we reached 700 G where we started seeing failures and nodes flipping.=
<br>
<br>
With 1 TB of data the node refuses to come back due to lack of memory.<br>
needless to say repairs and compactions takes a lot of time. We upped the<b=
r>
heap from 8 G to 12 G and suddenly everything started moving rapidly i.e.<b=
r>
the repair tasks and the compaction tasks. But soon (in about 9-10 hrs) we<=
br>
started seeing the same symptoms as we were seeing with 8 G.<br>
<br>
So my question is how do I determine what is the optimal size of heap<br>
for data around 1 TB ?<br>
<br>
Following are some of my JVM settings<br>
<br>
-Xms8G<br>
-Xmx8G<br>
-Xmn800m<br>
-XX:NewSize=3D1200M<br>
XX:MaxTenuringThreshold=3D2<br>
-XX:SurvivorRatio=3D4<br>
<br>
Thanks !<br>
<br>
</blockquote>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</div></div></blockquote></div><br></div>

--bcaec5016625b7e49504e7d5cc96--