Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of tivv00@gmail.com designates
 74.125.82.172 as permitted sender)
Message-ID: <4F6A0B52.2020400@gmail.com>
Date: Wed, 21 Mar 2012 19:09:38 +0200
From: Vitalii Tymchyshyn <tivv00@gmail.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
 rv:1.9.2.27) Gecko/20120216 Lightning/1.0b2 Thunderbird/3.1.19
MIME-Version: 1.0
To: user@cassandra.apache.org
CC: A J <s5alye@gmail.com>
Subject: Re: Max # of CFs
References: 
 <CAGXoe2UjeSNfBWf4ySP5+CH8wabo8iQfsCUBSd=w0fKjD4JuJA@mail.gmail.com>
	<CA+VSrLqy9G5P-h9kKo8dx7VcjMVDX_vrVa9dd40igAgqccCGqQ@mail.gmail.com>
	<CAGXoe2XXZAYagV1xBDUaTU0-xkTy80ZhPM83Ukrorukb+stqNw@mail.gmail.com>
	<4F689D92.8080509@gmail.com>
	<CAGXoe2X_o5u9brusTavNM7Qsb7jx_mFZmdjpZ+=WRDVAuQDczw@mail.gmail.com>
	<4F69D016.7000505@gmail.com>
 <CAGXoe2XKCUCAdLwQrsZfAg0WNLH8dtCR-4BocTY1YrqwjkcuZQ@mail.gmail.com>
In-Reply-To: 
 <CAGXoe2XKCUCAdLwQrsZfAg0WNLH8dtCR-4BocTY1YrqwjkcuZQ@mail.gmail.com>
Content-Type: text/plain; charset=KOI8-R; format=flowed
Content-Transfer-Encoding: 8bit

There is a forced flusher that kicks in when your heap becomes full. 
Look for log lines from GCInspector.
There is a bug that prevents flushing memtable when it has only full key 
delete mutations, see https://issues.apache.org/jira/browse/CASSANDRA-3741
For me it happened when we've started to move to new schema, so that old 
column families started to receive delete only operations. An 
indications is when GCInspector can't flush anything but system keyspace.

21.03.12 17:29, A J �������(��):
> I have increased index_interval. Will let you know if I see a difference.
>
>
> My theory is that memtables are not getting flushed. If I manually
> flush them, the heap consumption goes down drastically.
>
> I think when memtable_total_space_in_mb is exceeded not enough
> memtables are getting flushed. There are 5000 memtables (one for each
> CF) but each memtable in itself is small. So flushing of one or two
> memtable by Cassandra is not helping.
>
> Question: How many memtables are flushed when
> memtable_total_space_in_mb is exceeded ? Any way to flush all
> memtables when the threshold is reached ?
>
> Thanks.
>
> On Wed, Mar 21, 2012 at 8:56 AM, Vitalii Tymchyshyn<tivv00@gmail.com>  wrote:
>> Hello.
>>
>> There is also a primary row index. It's space can be controlled with
>> index_interval setting. Don't know if you can look for it's memory usage
>> somewhere. If I where you, I'd take jmap tool and examine heap histogram
>> first, heap dump second.
>>
>> Best regards, Vitalii Tymchyshyn
>>
>> 20.03.12 18:12, A J �������(��):
>>
>>> I have both row cache and column cache disabled for all my CFs.
>>>
>>> cfstats says "Bloom Filter Space Used: 1760" per CF. Assuming it is in
>>> bytes, it is total of about 9MB of bloom filter size for 5K CFs; which
>>> is not a lot.
>>>
>>>
>>> On Tue, Mar 20, 2012 at 11:09 AM, Vitalii Tymchyshyn<tivv00@gmail.com>
>>>   wrote:
>>>> Hello.
>>>>
>>>>   From my experience it's unwise to make many column families for same
>>>> keys
>>>> because you will have bloom filters and row indexes multiplied. If you
>>>> have
>>>> 5000, you should expect your heap requirements multiplied by same factor.
>>>> Also check your cache sizes. Default AFAIR is 100000 keys per column
>>>> family.
>>>>
>>>> 20.03.12 16:05, A J �������(��):
>>>>
>>>>> ok, the last thread says that 1.0+ onwards, thousands of CFs should
>>>>> not be a problem.
>>>>>
>>>>> But I am finding that all the allocated heap memory is getting consumed.
>>>>> I started with 8GB heap and then on reading
>>>>>
>>>>>
>>>>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management
>>>>> realized that minimum of 1MB per memtable is used by the per-memtable
>>>>> arena allocator.
>>>>> So with 5K CFs, 5GB will be used just by arena allocators.
>>>>>
>>>>> But even on increasing the heap to 16GB, am finding that all the heap
>>>>> is getting consumed. Is there a different formula for heap calculation
>>>>> when you have thousands of CFs ?
>>>>> Any other configuration that I need to change ?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> On Mon, Mar 19, 2012 at 10:35 AM, Alain RODRIGUEZ<arodrime@gmail.com>
>>>>>   wrote:
>>>>>> This subject was already discussed, this may help you :
>>>>>>
>>>>>>
>>>>>> http://markmail.org/message/6dybhww56bxvufzf#query:+page:1+mid:6dybhww56bxvufzf+state:results
>>>>>>
>>>>>> If you still got questions after reading this thread or some others
>>>>>> about
>>>>>> the same topic, do not hesitate asking again,
>>>>>>
>>>>>> Alain
>>>>>>
>>>>>>
>>>>>> 2012/3/19 A J<s5alye@gmail.com>
>>>>>>> How many Column Families are one too many for Cassandra ?
>>>>>>> I created a db with 5000 CFs (I can go into the reasons later) but the
>>>>>>> latency seems to be very erratic now. Not sure if it is because of the
>>>>>>> number of CFs.
>>>>>>>
>>>>>>> Thanks.
>>>>>>