incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Ingalls <>
Subject Re: understanding memory footprint
Date Fri, 16 Aug 2013 04:53:56 GMT
Hey Aaron,

I went ahead and changed the model around to reduce the number of CF's from around 60 or so
to 7, but I'm still running into OOM messages and eventual node crashes after I've pushed
in about 30GB of data per node.  And it seems that, under load, once one node goes down, the
other seems to follow within a few minutes.  Its like the cluster just hits a wall.

Some details from my cluster.  As you can see, most are at defaults. Let me know if you need
more data

5 nodes running 1.2.8 on 4CPU VM's with 7GB RAM
750GB raid 0 disk

num_tokens = 256
key_cache_size_in_mb is empty, so using the default
row cache is disabled
commitlog_segment_size_in_mb: 32
flush_largest_memtables_at: 0.75
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6
concurrent_reads: 32
concurrent_writes: 32
commitlog_total_space_in_mb: 768 - I reduced a bit
memtable_flush_queue_size: 5
rpc_server_type: hsha
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 64
compaction_throughput_mb_per_sec: 16

one of my tables creates a lot of large rows, would it make sense to change the partition
key to break the large rows into more, smaller rows?

Thanks for the help!


On Aug 14, 2013, at 8:05 PM, Aaron Morton <> wrote:

>> "Does the number of column families still significantly impact the memory footprint?
If so, what is the incremental cost of a column family/table?"
> IMHO there would be little difference in memory use for a node with zero data that had
10 CF's and one that had 100 CF's. When you start putting data in the story changes. 
> As Alain said, the number of rows can impact the memory use. In 1.2+ that's less of an
issue, but the index samples are still on heap. In my experience in normal (4Gb to 8GB heap)
this is not an issue until you get into 500+ million rows. 
> The number of CF's is still used when calculating when to flush to disk. If you have
100 cf's the server will flush to disk more frequently than if you have 10. Because it needs
to leave more room for the memtables to grow. 
> The best way to get help on this is provide details on the memory settings, the numbers
of CF's, the total number of rows, and the cache settings. 
> Hope that helps. 
> -----------------
> Aaron Morton
> Cassandra Consultant
> New Zealand
> @aaronmorton
> On 13/08/2013, at 9:10 PM, Alain RODRIGUEZ <> wrote:
>> if using 1.2.*, Bloom filters are in native memory so not pressuring your heap, how
many data do you have per node ? If this value is big, you have samples index in the heap
consuming a lot of memory, for sure, and growing as your data per node grow.
>> Solutions : increase the heap if < 8GB and / or reduce sampling index_interval:
128 to a bigger value (256 - 512) and /or wait for 2.0.* which, of the top of my head, should
move the sampling in native memory allowing heap size to be independent from the data size
per node.
>> This should alleviate things. Yet these are only guesses since I know almost nothing
about your cluster...
>> Hope this help somehow.
>> 2013/8/12 Robert Coli <>
>> On Mon, Aug 12, 2013 at 11:14 AM, Paul Ingalls <> wrote:
>> I don't really need exact numbers, just a rough cost would be sufficient.  I'm running
into memory problems on my cluster, and I'm trying to decide if reducing the number of column
families would be worth the effort.  Looking at the rule of thumb from the wiki entry made
it seem like reducing the number of tables would make a big impact, but I'm running 1.2.8
so not sure if it is still true.
>> Is there a new rule of thumb?
>> If you want a cheap/quick measure of how much space partially full memtables are
taking, just nodetool flush and check heap usage before and after?
>> If you want a cheap/quick measure of how much space empty sstables take in heap,
I think you're out of luck.
>> =Rob

View raw message