incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Zhu <wz1...@yahoo.com>
Subject Re: About the heap
Date Wed, 13 Mar 2013 18:37:25 GMT
It's not BloomFilter. 

Cassandra will read through sstable index files on start-up, doing what is known as "index
sampling". This is used to keep a subset (currently and by default, 1 out of 100) of keys
and and their on-disk location in the index, in memory. See ArchitectureInternals. This means
that the larger the index files are, the longer it takes to perform this sampling. Thus, for
very large indexes (typically when you have a very large number of keys) the index sampling
on start-up may be a significant issue.

http://wiki.apache.org/cassandra/LargeDataSetConsiderations

-Wei

----- Original Message -----
From: "Alain RODRIGUEZ" <arodrime@gmail.com>
To: user@cassandra.apache.org
Sent: Wednesday, March 13, 2013 11:28:28 AM
Subject: Re: About the heap


" called index_interval set to 128" 


I think this is for BloomFilters actually. 



2013/3/13 Hiller, Dean < Dean.Hiller@nrel.gov > 


Going to 1.2.2 helped us quite a bit as well as turning on LCS from STCS which gave us smaller
bloomfilters. 

As far as key cache. There is an entry in cassandra.yaml called index_interval set to 128.
I am not sure if that is related to key_cache. I think it is. By turning that to 512 or maybe
even 1024, you will consume less ram there as well though I ran this test in QA and my key
cache size stayed the same so I am really not sure(I am actually checking out cassandra code
now to dig a little deeper into this property. 

Dean 

From: Alain RODRIGUEZ < arodrime@gmail.com <mailto: arodrime@gmail.com >> 
Reply-To: " user@cassandra.apache.org <mailto: user@cassandra.apache.org >" < user@cassandra.apache.org
<mailto: user@cassandra.apache.org >> 
Date: Wednesday, March 13, 2013 10:11 AM 
To: " user@cassandra.apache.org <mailto: user@cassandra.apache.org >" < user@cassandra.apache.org
<mailto: user@cassandra.apache.org >> 
Subject: About the heap 



Hi, 

I would like to know everything that is in the heap. 

We are here speaking of C*1.1.6 

Theory : 

- Memtable (1024 MB) 
- Key Cache (100 MB) 
- Row Cache (disabled, and serialized with JNA activated anyway, so should be off-heap) 
- BloomFilters (about 1,03 GB - from cfstats, adding all the "Bloom Filter Space Used" and
considering they are showed in Bytes - 1103765112) 
- Anything else ? 

So my heap should be fluctuating between 1,15 GB and 2.15 GB and growing slowly (from the
new BF of my new data). 

My heap is actually changing from 3-4 GB to 6 GB and sometimes growing to the max 8 GB (crashing
the node). 

Because of this I have an unstable cluster and have no other choice than use Amazon EC2 xLarge
instances when we would rather use twice more EC2 Large nodes. 

What am I missing ? 

Practice : 

Is there a way not inducing any load and easy to do to dump the heap to analyse it with MAT
(or anything else that you could advice) ? 

Alain 



Mime
View raw message