Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4D1F9C16B for ; Mon, 3 Jun 2013 14:15:42 +0000 (UTC) Received: (qmail 81713 invoked by uid 500); 3 Jun 2013 14:08:21 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 81703 invoked by uid 500); 3 Jun 2013 14:08:21 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 81695 invoked by uid 99); 3 Jun 2013 14:08:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Jun 2013 14:08:19 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of comomore@gmail.com designates 209.85.220.182 as permitted sender) Received: from [209.85.220.182] (HELO mail-vc0-f182.google.com) (209.85.220.182) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Jun 2013 14:08:15 +0000 Received: by mail-vc0-f182.google.com with SMTP id gf12so2670087vcb.13 for ; Mon, 03 Jun 2013 07:07:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=9Auo5sJj4l6S2n27IWaZkwQ/4Yut/3BGY1dJjdtTNkY=; b=qEvc2d/pRB8fvh66d2FrvAnoQjDwKKfEwfotpPWbrCATjBo64AKt249GM+iFmbhg6w nZquosyL6LXlCPU4/LT/jVoLu7IMlR4IMso5cM+tXJL/hraieVByUmqhHsg2NuJyH2jR osMj1oQFV5cMWB01PP0dTwdrYLLCMGHr8r4vIevXcSBtjqLtX2eU9SEwgvur6TPdcM3H BMWaoB567m96dyr3cUEQU0e/IlHw94Munfou662bH32hA18UxzMGW4Q4AhOuqWKNf352 o7RZuvJJdRe4Na/GbJ5U4sou02IHaL1mgV/P0XNXfrSYzKMMTqpm6SZzwWtbFJSBxX10 PVLw== MIME-Version: 1.0 X-Received: by 10.220.45.9 with SMTP id c9mr12038816vcf.65.1370268474917; Mon, 03 Jun 2013 07:07:54 -0700 (PDT) Received: by 10.220.249.67 with HTTP; Mon, 3 Jun 2013 07:07:54 -0700 (PDT) In-Reply-To: <0DAD2A4E-F7B6-4421-B955-708666BA6887@grapheffect.com> References: <0DAD2A4E-F7B6-4421-B955-708666BA6887@grapheffect.com> Date: Mon, 3 Jun 2013 09:07:54 -0500 Message-ID: Subject: Re: Cassandra performance decreases drastically with increase in data size. From: srmore To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a11c2013eb1cb5704de407d4f X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2013eb1cb5704de407d4f Content-Type: text/plain; charset=ISO-8859-1 Thanks all for the help. I ran the traffic over the weekend surprisingly, my heap was doing OK (around 5.7G of 8G) but GC activity went nuts and dropped the throughput. I will probably increase the number of nodes. The other interesting thing I noticed was that there were some objects with finalize() methods, this could potentially cause GC issues. On Fri, May 31, 2013 at 1:47 AM, Aiman Parvaiz wrote: > I believe you should roll out more nodes as a temporary fix to your > problem, 400GB on all nodes means (as correctly mentioned in other mails of > this thread) you are spending more time on GC. Check out the second comment > in this link by Aaron Morton, he says the more than 300GB can be > problematic, though this post is about older version of cassandra but I > believe concept still stands true: > > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-it-safe-to-stop-a-read-repair-and-any-suggestion-on-speeding-up-repairs-td6607367.html > > Thanks > > On May 29, 2013, at 9:32 PM, srmore wrote: > > Hello, > I am observing that my performance is drastically decreasing when my data > size grows. I have a 3 node cluster with 64 GB of ram and my data size is > around 400GB on all the nodes. I also see that when I re-start Cassandra > the performance goes back to normal and then again starts decreasing after > some time. > > Some hunting landed me to this page > http://wiki.apache.org/cassandra/LargeDataSetConsiderations which talks > about the large data sets and explains that it might be because I am going > through multiple layers of OS cache, but does not tell me how to tune it. > > So, my question is, are there any optimizations that I can do to handle > these large datatasets ? > > and why does my performance go back to normal when I restart Cassandra ? > > Thanks ! > > > --001a11c2013eb1cb5704de407d4f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks all for the help.
I ran the tra= ffic over the weekend surprisingly, my heap was doing OK (around 5.7G of 8G= ) but GC activity went nuts and dropped the throughput. I will probably inc= rease the number of nodes.

The other interesting thing I noticed was that there were some ob= jects with finalize() methods, this could potentially cause GC issues.
<= /div>


On Fri, = May 31, 2013 at 1:47 AM, Aiman Parvaiz <aiman@grapheffect.com><= /span> wrote:
I believ= e you should roll out more nodes as a temporary fix to your problem, 400GB = on all nodes means (as correctly mentioned in other mails of this thread) y= ou are spending more time on GC. Check out the second comment in this link = by Aaron Morton, he says the more than 300GB can be problematic, though thi= s post is about older version of cassandra but I believe concept still stan= ds true:


Thanks

O= n May 29, 2013, at 9:32 PM, srmore <comomore@gmail.com> wrote:

Hello,
I am observing th= at my performance is drastically decreasing when my data size grows. I have= a 3 node cluster with 64 GB of ram and my data size is around 400GB on all= the nodes. I also see that when I re-start Cassandra the performance goes = back to normal and then again starts decreasing after some time.

Some hunting landed me to this page http://wiki.ap= ache.org/cassandra/LargeDataSetConsiderations which talks about the lar= ge data sets and explains that it might be because I am going through multi= ple layers of OS cache, but does not tell me how to tune it.

So, my question is, are there any optimizations that I can do to = handle these large datatasets ?

and why does my performance go= back to normal when I restart Cassandra ?

Thanks !


--001a11c2013eb1cb5704de407d4f--