Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 14CED9731 for ; Thu, 14 Mar 2013 15:45:27 +0000 (UTC) Received: (qmail 93528 invoked by uid 500); 14 Mar 2013 15:45:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 93493 invoked by uid 500); 14 Mar 2013 15:45:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 93472 invoked by uid 99); 14 Mar 2013 15:45:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Mar 2013 15:45:23 +0000 X-ASF-Spam-Status: No, hits=3.5 required=5.0 tests=FORGED_YAHOO_RCVD,FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.139.213.147] (HELO nm10-vm0.bullet.mail.bf1.yahoo.com) (98.139.213.147) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 14 Mar 2013 15:45:12 +0000 Received: from [98.139.212.147] by nm10.bullet.mail.bf1.yahoo.com with NNFMP; 14 Mar 2013 15:44:50 -0000 Received: from [98.139.211.203] by tm4.bullet.mail.bf1.yahoo.com with NNFMP; 14 Mar 2013 15:44:50 -0000 Received: from [127.0.0.1] by smtp212.mail.bf1.yahoo.com with NNFMP; 14 Mar 2013 15:44:50 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1363275890; bh=uGdOziAIdeOGHG7GOcBBvLvUqNO8fl5dEMUINOC0hLg=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:X-Rocket-Received:From:Mime-Version:Content-Type:Subject:Date:In-Reply-To:To:References:Message-Id:X-Mailer; b=iYnJSapyORpYV5UPAAHP23KcAS0NTOQWbthMBnJnLU9h4mpQXAM/FXjWERJXUQyxSYdiTzDBI8rcaswfo7M0c5l8HtLdEXGlyUR1muYcaspa67ZPiyb/gsq0ztH3rN7hf8c9bXmytn1iiBgZgiHezdrOA1bKLojl6aNW7tdxy8s= X-Yahoo-Newman-Id: 258996.43906.bm@smtp212.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: IBqldOoVM1mioph5nrE5AJ4tL9pzGrJjzkuDkwwww1U58Et VhodYVEVA8TGEvuvmN_PDnPRr8ggwmpx7fygpcOxhF6sY7RsmpDq7gFUlQA2 K5i98RvxcQvYcT8SJasX3zjqopk6EOlB4wwwtHOGxMDZuaOaMJbIl9eA.EV4 BI1PT57MpbAAaL1hiG5iC4UXvToRF90OUHML9Ezh.0tyY2o1t9SBXnA6iqXi jn6xOT.8Mhi_CK725rYexS1PkaZO0QJozGG.9HWLfhNbWOdALiG6im61v7bk uagGujHdP.5xaW_us0EQmhfnoUJHlMTlfcM_WadYyXmNZD9X13O5A7cC191k sx93058yPkEzC26o7ZL6RfweJzeEA.hfIeT_3ABmB8tjHWsKu9QuDlkgdNmK ZFarqnhAuGPUVdjwRE6MIIFCG3VBr3xa2DYc2bSEmvzrD4PLcheLZPdZz1Gb 83rEdOd5_J9bJXlpHr6TdhxMeLHfeYOXfToMKspvrlZhJ8zmbXXuasji9v_K G2RUcy4dlszGnEQQ4AeB.MRzB_w2J1Q.iHwXiFx15C_fqK5TH94wepJhrsat KOWruzZ7RXfhe3a8Ae_P51hVFp9wDTnMUTp28f53Yv16V0PCC0JKUCU740Ki fzhYkzw4Enr2Qbm0.HsWZvpoxO4IyW.b7xuaij.g69jgqCHcTPfuye.0TKbm ThIiLC6RXUK1H.9T8Y1LXGc2ZnV0ccFOLCN_7wFfwTcu0Qm7mJNNiFqqF_Dk uqeVHMLdIqxulN5qYIGLpNGqVFqAoZ3Ck1IJ8_tiuLwAt5DNXTMeYrDjcXQf 22CFcudQmeoo.Y3OSpdjHQI1uzn7fzl8nUSV6BJsLeyd0 X-Yahoo-SMTP: t0UN_U2swBCFgwLIRu70LU92TrvpdQ-- X-Rocket-Received: from [192.168.100.100] (mtheroux2@166.186.169.126 with plain) by smtp212.mail.bf1.yahoo.com with SMTP; 14 Mar 2013 08:44:49 -0700 PDT From: Michael Theroux Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: multipart/alternative; boundary="Apple-Mail=_6F105BC5-5BA9-4B42-BBDA-01F8200E9CA5" Subject: Re: About the heap Date: Thu, 14 Mar 2013 11:44:44 -0400 In-Reply-To: <0FA477C0-0841-4E3D-9872-2400192C9645@thelastpickle.com> To: user@cassandra.apache.org References: <1363202347.80717.GenericBBA@web160906.mail.bf1.yahoo.com> <0FA477C0-0841-4E3D-9872-2400192C9645@thelastpickle.com> Message-Id: <6B1CE6BF-25E4-4F8F-903E-FE1892C44618@yahoo.com> X-Mailer: Apple Mail (2.1283) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_6F105BC5-5BA9-4B42-BBDA-01F8200E9CA5 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hi Aaron, If you have the chance, could you expand on m1.xlarge being the much = better choice? We are going to need to make a choice of expanding from = a 12 node -> 24 node cluster using .large instances, vs. upgrading all = instances to m1.xlarge, soon and the justifications would be helpful = (although Aaron says so does help ;) ). =20 One obvious reason is administrating a 24 node cluster does add = person-time overhead. =20 Another reason includes less impact of maintenance activities such as = repair, as these activites have significant CPU overhead. Doubling the = cluster size would, in theory, halve the time for this overhead, but = would still impact performance during that time. Going to xlarge would = lessen the impact of these activities on operations. Anything else? Thanks, -Mike On Mar 14, 2013, at 9:27 AM, aaron morton wrote: >> Because of this I have an unstable cluster and have no other choice = than use Amazon EC2 xLarge instances when we would rather use twice more = EC2 Large nodes. > m1.xlarge is a MUCH better choice than m1.large. > You get more ram and better IO and less steal. Using half as many = m1.xlarge is the way to go.=20 >=20 >> My heap is actually changing from 3-4 GB to 6 GB and sometimes = growing to the max 8 GB (crashing the node). > How is it crashing ? > Are you getting too much GC or running OOM ?=20 > Are you using the default GC configuration ? > Is cassandra logging a lot of GC warnings ? >=20 > If you are running OOM then something has to change. Maybe bloom = filters, maybe caches. >=20 > Enable the GC logging in cassandra-env.sh to check how low a CMS = compaction get's the heap, or use some other tool. That will give an = idea of how much memory you are using.=20 >=20 > Here is some background on what is kept on heap in pre 1.2 > http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html >=20 > Cheers >=20 > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand >=20 > @aaronmorton > http://www.thelastpickle.com >=20 > On 13/03/2013, at 12:19 PM, Wei Zhu wrote: >=20 >> Here is the JIRA I submitted regarding the ancestor. >>=20 >> https://issues.apache.org/jira/browse/CASSANDRA-5342 >>=20 >> -Wei >>=20 >>=20 >> ----- Original Message ----- >> From: "Wei Zhu" >> To: user@cassandra.apache.org >> Sent: Wednesday, March 13, 2013 11:35:29 AM >> Subject: Re: About the heap >>=20 >> Hi Dean, >> The index_interval is controlling the sampling of the SSTable to = speed up the lookup of the keys in the SSTable. Here is the code: >>=20 >> = https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassand= ra/db/DataTracker.java#L478 >>=20 >> To increase the interval meaning, taking less samples, less memory, = slower lookup for read. >>=20 >> I did do a heap dump on my production system which caused about 10 = seconds pause of the node. I found something interesting, for LCS, it = could involve thousands of SSTables for one compaction, the ancestors = are recorded in case something goes wrong during the compaction. But = those are never removed after the compaction is done. In our case, it = takes about 1G of heap memory to store that. I am going to submit a JIRA = for that.=20 >>=20 >> Here is the culprit: >>=20 >> = https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassand= ra/io/sstable/SSTableMetadata.java#L58 >>=20 >> Enjoy looking at Cassandra code:) >>=20 >> -Wei >>=20 >>=20 >> ----- Original Message ----- >> From: "Dean Hiller" >> To: user@cassandra.apache.org >> Sent: Wednesday, March 13, 2013 11:11:14 AM >> Subject: Re: About the heap >>=20 >> Going to 1.2.2 helped us quite a bit as well as turning on LCS from = STCS which gave us smaller bloomfilters. >>=20 >> As far as key cache. There is an entry in cassandra.yaml called = index_interval set to 128. I am not sure if that is related to = key_cache. I think it is. By turning that to 512 or maybe even 1024, = you will consume less ram there as well though I ran this test in QA and = my key cache size stayed the same so I am really not sure(I am actually = checking out cassandra code now to dig a little deeper into this = property. >>=20 >> Dean >>=20 >> From: Alain RODRIGUEZ > >> Reply-To: = "user@cassandra.apache.org" = > >> Date: Wednesday, March 13, 2013 10:11 AM >> To: "user@cassandra.apache.org" = > >> Subject: About the heap >>=20 >> Hi, >>=20 >> I would like to know everything that is in the heap. >>=20 >> We are here speaking of C*1.1.6 >>=20 >> Theory : >>=20 >> - Memtable (1024 MB) >> - Key Cache (100 MB) >> - Row Cache (disabled, and serialized with JNA activated anyway, so = should be off-heap) >> - BloomFilters (about 1,03 GB - from cfstats, adding all the "Bloom = Filter Space Used" and considering they are showed in Bytes - = 1103765112) >> - Anything else ? >>=20 >> So my heap should be fluctuating between 1,15 GB and 2.15 GB and = growing slowly (from the new BF of my new data). >>=20 >> My heap is actually changing from 3-4 GB to 6 GB and sometimes = growing to the max 8 GB (crashing the node). >>=20 >> Because of this I have an unstable cluster and have no other choice = than use Amazon EC2 xLarge instances when we would rather use twice more = EC2 Large nodes. >>=20 >> What am I missing ? >>=20 >> Practice : >>=20 >> Is there a way not inducing any load and easy to do to dump the heap = to analyse it with MAT (or anything else that you could advice) ? >>=20 >> Alain >>=20 >>=20 >=20 --Apple-Mail=_6F105BC5-5BA9-4B42-BBDA-01F8200E9CA5 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii Hi = Aaron,

If you have the chance, could you expand on = m1.xlarge being the much better choice?  We are going to need to = make a choice of expanding from a 12 node -> 24 node cluster using = .large instances, vs. upgrading all instances to m1.xlarge, soon and the = justifications would be helpful (although Aaron says so does help ;) ). =  

One obvious reason is administrating a = 24 node cluster does add person-time overhead. =  

Another reason includes less impact of = maintenance activities such as repair, as these activites have = significant CPU overhead.  Doubling the cluster size would, in = theory, halve the time for this overhead, but would still impact = performance during that time.  Going to xlarge would lessen the = impact of these activities on = operations.

Anything = else?

Thanks,

-Mike

On Mar 14, 2013, at 9:27 AM, aaron morton = wrote:

Because of this I have an unstable = cluster and have no other choice than use Amazon EC2 xLarge instances = when we would rather use twice more EC2 Large = nodes.
m1.xlarge is a MUCH better choice than = m1.large.
You get more ram and better IO and less steal. Using = half as many m1.xlarge is the way to = go. 

My heap is = actually changing from 3-4 GB to 6 GB and sometimes growing to the max 8 = GB (crashing the node).
How is it crashing = ?
Are you getting too much GC or running OOM = ? 
Are you using the default GC configuration = ?
Is cassandra logging a lot of GC warnings = ?

If you are running OOM then something has to = change. Maybe bloom filters, maybe = caches.

Enable the GC logging in = cassandra-env.sh to check how low a CMS compaction get's the heap, or = use some other tool. That will give an idea of how much memory you are = using. 

Here is some background on what is = kept on heap in pre 1.2
http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html=

Cheers

http://www.thelastpickle.com

On 13/03/2013, at 12:19 PM, Wei Zhu <wz1975@yahoo.com> = wrote:

Here is the JIRA I submitted regarding the = ancestor.

https://issu= es.apache.org/jira/browse/CASSANDRA-5342

-Wei


----- = Original Message -----
From: "Wei Zhu" <wz1975@yahoo.com>
To: user@cassandra.apache.orgSent: Wednesday, March 13, 2013 11:35:29 AM
Subject: Re: About the = heap

Hi Dean,
The index_interval is controlling the sampling = of the SSTable to speed up the lookup of the keys in the SSTable. Here = is the code:

https://github.com/apache/cassandra/b= lob/trunk/src/java/org/apache/cassandra/db/DataTracker.java#L478
To increase the interval meaning, taking less samples, less memory, = slower lookup for read.

I did do a heap dump on my production = system which caused about 10 seconds pause of the node. I found = something interesting, for LCS, it could involve thousands of SSTables = for one compaction, the ancestors are recorded in case something goes = wrong during the compaction. But those are never removed after the = compaction is done. In our case, it takes about 1G of heap memory to = store that. I am going to submit a JIRA for that.

Here is the = culprit:

https://github.com/apache/cassandra/blob/trunk/src/java/or= g/apache/cassandra/io/sstable/SSTableMetadata.java#L58

Enjoy = looking at Cassandra code:)

-Wei


----- Original = Message -----
From: "Dean Hiller" <Dean.Hiller@nrel.gov>
To: = user@cassandra.apache.org
Sent: Wednesday, March 13, 2013 11:11:14 = AM
Subject: Re: About the heap

Going to 1.2.2 helped us quite = a bit as well as turning on LCS from STCS which gave us smaller = bloomfilters.

As far as key cache.  There is an entry in = cassandra.yaml called index_interval set to 128.  I am not sure if = that is related to key_cache.  I think it is.  By turning that = to 512 or maybe even 1024, you will consume less ram there as well = though I ran this test in QA and my key cache size stayed the same so I = am really not sure(I am actually checking out cassandra code now to dig = a little deeper into this property.

Dean

From: Alain = RODRIGUEZ = <arodrime@gmail.com<mailto:arodrime@gmail.com>>
Reply-To: = "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" = <user@cassandra.apache.org<mailto:user@cassandra.apache.org>><= br>Date: Wednesday, March 13, 2013 10:11 AM
To: = "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" = <user@cassandra.apache.org<mailto:user@cassandra.apache.org>><= br>Subject: About the heap

Hi,

I would like to know = everything that is in the heap.

We are here speaking of = C*1.1.6

Theory :

- Memtable (1024 MB)
- Key Cache (100 = MB)
- Row Cache (disabled, and serialized with JNA activated anyway, = so should be off-heap)
- BloomFilters (about 1,03 GB - from cfstats, = adding all the "Bloom Filter Space Used" and considering they are showed = in Bytes - 1103765112)
- Anything else ?

So my heap should be = fluctuating between 1,15 GB and 2.15 GB and growing slowly (from the new = BF of my new data).

My heap is actually changing from 3-4 GB to 6 = GB and sometimes growing to the max 8 GB (crashing the = node).

Because of this I have an unstable cluster and have no = other choice than use Amazon EC2 xLarge instances when we would rather = use twice more EC2 Large nodes.

What am I missing = ?

Practice :

Is there a way not inducing any load and easy = to do to dump the heap to analyse it with MAT (or anything else that you = could advice) = ?

Alain



=

= --Apple-Mail=_6F105BC5-5BA9-4B42-BBDA-01F8200E9CA5--