incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedict Elliott Smith <belliottsm...@datastax.com>
Subject Re: memtable mem usage off by 10?
Date Wed, 04 Jun 2014 12:07:44 GMT
I'm confused: there is no flush_largest_memtables_at property in C* 2.0?


On 4 June 2014 12:55, Idrén, Johan <Johan.Idren@dice.se> wrote:

>  Ok, so the overhead is a constant modifier, right.
>
>
>  The 3x I arrived at with the following assumptions:
>
>
>  heap is 10GB
>
> Default memory for memtable usage is 1/4 of heap in c* 2.0
>  max memory used for memtables is 2,5GB (10/4)
>
> flush_largest_memtables_at is 0.75
>
> flush largest memtables when memtables use 7,5GB (3/4 of heap, 3x of the
> default)
>
>
>  With an overhead of 10x, it makes sense that my memtable is flushed when
> the jmx data says it is at ~250MB, ie 2,5GB, ie 1/4 of the heap
>
>
>  After I've set the memtable_total_size_in_mb to a value larger than
> 7,5GB, it should still not go over 7,5GB on account of
> flush_largest_memtables_at, 3/4 the heap
>
>
>  So I would expect to see memtables flushed to disk after they're being
> reportedly at around 750MB.
>
>
>  Having memtable_total_size_in_mb set to 20480, memtables are flushed at
> a reported value of ~2GB.
>
>
>  With a constant overhead, this would mean that it used 20GB, which is 2x
> the size of the heap, instead of 3/4 of the heap as it should be if
> flush_largest_memtables_at was being respected.
>
>
>  This shouldn't be possible.
>
>
>  ------------------------------
> *From:* Benedict Elliott Smith <belliottsmith@datastax.com>
> *Sent:* Wednesday, June 4, 2014 1:19 PM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: memtable mem usage off by 10?
>
>  Unfortunately it looks like the heap utilisation of memtables was not
> exposed in earlier versions, because they only maintained an estimate.
>
>  The overhead scales linearly with the amount of data in your memtables
> (assuming the size of each cell is approx. constant).
>
>  flush_largest_memtables_at is an independent setting to
> memtable_total_space_in_mb, and generally has little effect. Ordinarily
> sstable flushes are triggered by hitting the memtable_total_space_in_mb
> limit. I'm afraid I don't follow where your 3x comes from?
>
>
> On 4 June 2014 12:04, Idrén, Johan <Johan.Idren@dice.se> wrote:
>
>>  Aha, ok. Thanks.
>>
>>
>>  Trying to understand what my cluster is doing:
>>
>>
>>  cassandra.db.memtable_data_size only gets me the actual data but not
>> the memtable heap memory usage. Is there a way to check for heap memory
>> usage?
>>
>>
>>  I would expect to hit the flush_largest_memtables_at value, and this
>> would be what causes the memtable flush to sstable then? By default 0.75?
>>
>>
>>  Then I would expect the amount of memory to be used to be maximum ~3x
>> of what I was seeing when I hadn't set memtable_total_space_in_mb (1/4 by
>> default, max 3/4 before a flush), instead of close to 10x (250mb vs 2gb).
>>
>>
>> This is of course assuming that the overhead scales linearly with the
>> amount of data in my table, we're using one table with three cells in this
>> case. If it hardly increases at all, then I'll give up I guess :)
>>
>> At least until 2.1.0 comes out and I can compare.
>>
>>
>>  BR
>>
>> Johan
>>
>>
>>  ------------------------------
>>  *From:* Benedict Elliott Smith <belliottsmith@datastax.com>
>>  *Sent:* Wednesday, June 4, 2014 12:33 PM
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: memtable mem usage off by 10?
>>
>>   These measurements tell you the amount of user data stored in the
>> memtables, not the amount of heap used to store it, so the same applies.
>>
>>
>> On 4 June 2014 11:04, Idrén, Johan <Johan.Idren@dice.se> wrote:
>>
>>>  I'm not measuring memtable size by looking at the sstables on disk,
>>> no. I'm looking through the JMX data. So I would believe (or hope) that I'm
>>> getting relevant data.
>>>
>>>
>>>  If I have a heap of 10GB and set the memtable usage to 20GB, I would
>>> expect to hit other problems, but I'm not seeing memory usage over 10GB for
>>> the heap, and the machine (which has ~30gb of memory) is showing ~10GB
>>> free, with ~12GB used by cassandra, the rest in caches.
>>>
>>>
>>>  Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it's not
>>> idling.
>>>
>>>
>>>  BR
>>>
>>> Johan
>>>
>>>
>>>  ------------------------------
>>> *From:* Benedict Elliott Smith <belliottsmith@datastax.com>
>>> *Sent:* Wednesday, June 4, 2014 11:56 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: memtable mem usage off by 10?
>>>
>>>   If you are storing small values in your columns, the object overhead
>>> is very substantial. So what is 400Mb on disk may well be 4Gb in memtables,
>>> so if you are measuring the memtable size by the resulting sstable size,
>>> you are not getting an accurate picture. This overhead has been reduced by
>>> about 90% in the upcoming 2.1 release, through tickets 6271
>>> <https://issues.apache.org/jira/browse/CASSANDRA-6271>, 6689
>>> <https://issues.apache.org/jira/browse/CASSANDRA-6689> and 6694
>>> <https://issues.apache.org/jira/browse/CASSANDRA-6694>.
>>>
>>>
>>> On 4 June 2014 10:49, Idrén, Johan <Johan.Idren@dice.se> wrote:
>>>
>>>>  Hi,
>>>>
>>>>
>>>>  I'm seeing some strange behavior of the memtables, both in 1.2.13 and
>>>> 2.0.7, basically it looks like it's using 10x less memory than it should
>>>> based on the documentation and options.
>>>>
>>>>
>>>>  10GB heap for both clusters.
>>>>
>>>> 1.2.x should use 1/3 of the heap for memtables, but it uses max ~300mb
>>>> before flushing
>>>>
>>>>
>>>>  2.0.7, same but 1/4 and ~250mb
>>>>
>>>>
>>>>  In the 2.0.7 cluster I set the memtable_total_space_in_mb to 4096,
>>>> which then allowed cassandra to use up to ~400mb for memtables...
>>>>
>>>>
>>>>  I'm now running with 20480 for memtable_total_space_in_mb and
>>>> cassandra is using ~2GB for memtables.
>>>>
>>>>
>>>>  Soo, off by 10 somewhere? Has anyone else seen this? Can't find a
>>>> JIRA for any bug connected to this.
>>>>
>>>> java 1.7.0_55, JNA 4.1.0 (for the 2.0 cluster)
>>>>
>>>>
>>>>  BR
>>>>
>>>> Johan
>>>>
>>>
>>>
>>
>

Mime
View raw message