Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of comomore@gmail.com designates
 209.85.220.182 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <0DAD2A4E-F7B6-4421-B955-708666BA6887@grapheffect.com>
References: 
 <CAP7WDFUPhYVOsbMj63Y-FM-kUD22ncaVwKYyuOfdVZskM1qycw@mail.gmail.com>
	<0DAD2A4E-F7B6-4421-B955-708666BA6887@grapheffect.com>
Date: Mon, 3 Jun 2013 09:07:54 -0500
Message-ID: 
 <CAP7WDFXDd68Bh77gKqe=8L9FPaSpjGQc2WiQSCyXLnOTxifthQ@mail.gmail.com>
Subject: Re: Cassandra performance decreases drastically with increase in data
 size.
From: srmore <comomore@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a11c2013eb1cb5704de407d4f

--001a11c2013eb1cb5704de407d4f
Content-Type: text/plain; charset=ISO-8859-1

Thanks all for the help.
I ran the traffic over the weekend surprisingly, my heap was doing OK
(around 5.7G of 8G) but GC activity went nuts and dropped the throughput. I
will probably increase the number of nodes.

The other interesting thing I noticed was that there were some objects with
finalize() methods, this could potentially cause GC issues.


On Fri, May 31, 2013 at 1:47 AM, Aiman Parvaiz <aiman@grapheffect.com>wrote:

> I believe you should roll out more nodes as a temporary fix to your
> problem, 400GB on all nodes means (as correctly mentioned in other mails of
> this thread) you are spending more time on GC. Check out the second comment
> in this link by Aaron Morton, he says the more than 300GB can be
> problematic, though this post is about older version of cassandra but I
> believe concept still stands true:
>
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-it-safe-to-stop-a-read-repair-and-any-suggestion-on-speeding-up-repairs-td6607367.html
>
> Thanks
>
> On May 29, 2013, at 9:32 PM, srmore <comomore@gmail.com> wrote:
>
> Hello,
> I am observing that my performance is drastically decreasing when my data
> size grows. I have a 3 node cluster with 64 GB of ram and my data size is
> around 400GB on all the nodes. I also see that when I re-start Cassandra
> the performance goes back to normal and then again starts decreasing after
> some time.
>
> Some hunting landed me to this page
> http://wiki.apache.org/cassandra/LargeDataSetConsiderations which talks
> about the large data sets and explains that it might be because I am going
> through multiple layers of OS cache, but does not tell me how to tune it.
>
> So, my question is, are there any optimizations that I can do to handle
> these large datatasets ?
>
> and why does my performance go back to normal when I restart Cassandra ?
>
> Thanks !
>
>
>

--001a11c2013eb1cb5704de407d4f
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div>Thanks all for the help. <br></div>I ran the tra=
ffic over the weekend surprisingly, my heap was doing OK (around 5.7G of 8G=
) but GC activity went nuts and dropped the throughput. I will probably inc=
rease the number of nodes.<br>
<br></div>The other interesting thing I noticed was that there were some ob=
jects with finalize() methods, this could potentially cause GC issues.<br><=
/div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Fri, =
May 31, 2013 at 1:47 AM, Aiman Parvaiz <span dir=3D"ltr">&lt;<a href=3D"mai=
lto:aiman@grapheffect.com" target=3D"_blank">aiman@grapheffect.com</a>&gt;<=
/span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div style=3D"word-wrap:break-word">I believ=
e you should roll out more nodes as a temporary fix to your problem, 400GB =
on all nodes means (as correctly mentioned in other mails of this thread) y=
ou are spending more time on GC. Check out the second comment in this link =
by Aaron Morton, he says the more than 300GB can be problematic, though thi=
s post is about older version of cassandra but I believe concept still stan=
ds true:<div>
<br></div><div><a href=3D"http://cassandra-user-incubator-apache-org.306514=
6.n2.nabble.com/Is-it-safe-to-stop-a-read-repair-and-any-suggestion-on-spee=
ding-up-repairs-td6607367.html" target=3D"_blank">http://cassandra-user-inc=
ubator-apache-org.3065146.n2.nabble.com/Is-it-safe-to-stop-a-read-repair-an=
d-any-suggestion-on-speeding-up-repairs-td6607367.html</a></div>
<div><br></div><div>Thanks</div><div><div class=3D"h5"><div><br><div><div>O=
n May 29, 2013, at 9:32 PM, srmore &lt;<a href=3D"mailto:comomore@gmail.com=
" target=3D"_blank">comomore@gmail.com</a>&gt; wrote:</div><br><blockquote =
type=3D"cite">
<div dir=3D"ltr"><div><div><div><div><div>Hello,<br></div>I am observing th=
at my performance is drastically decreasing when my data size grows. I have=
 a 3 node cluster with 64 GB of ram and my data size is around 400GB on all=
 the nodes. I also see that when I re-start Cassandra the performance goes =
back to normal and then again starts decreasing after some time. <br>

<br></div>Some hunting landed me to this page <a href=3D"http://wiki.apache=
.org/cassandra/LargeDataSetConsiderations" target=3D"_blank">http://wiki.ap=
ache.org/cassandra/LargeDataSetConsiderations</a> which talks about the lar=
ge data sets and explains that it might be because I am going through multi=
ple layers of OS cache, but does not tell me how to tune it.<br>

<br></div>So, my question is, are there any optimizations that I can do to =
handle these large datatasets ?<br><br></div>and why does my performance go=
 back to normal when I restart Cassandra ?<br><br></div>Thanks !<br> </div>

</blockquote></div><br></div></div></div></div></blockquote></div><br></div=
>

--001a11c2013eb1cb5704de407d4f--