Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of belliottsmith@datastax.com
 designates 209.85.223.171 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CFB4FEF6.78EE%johan.idren@dice.se>
References: <1401875371141.25081@dice.se>
	<CAFqWSgeZDXCzkFitYQ82jxwcxFF0MKAqu3hyu5GoFX6M9TZEzQ@mail.gmail.com>
	<1401876241947.15837@dice.se>
	<CAFqWSgeSbwgLhrbzuCWfsLoXZ-VCQfJwLOwv33Bd=yrMOUe7fg@mail.gmail.com>
	<1401879841309.12539@dice.se>
	<CAFqWSgd10VJTZ9uhjGf604w8P+LiyHGhcDkOMPAtKM=_wAzdhw@mail.gmail.com>
	<1401882919755.3901@dice.se>
	<CAFqWSgeJTnCbokNQnX518X_iJPYUDJyUDemSG9TXjL0JYQcqmA@mail.gmail.com>
	<1401884768496.56712@dice.se>
	<3FD7F8AA2F44469481822F854E288BD3@JackKrupansky14>
	<CAFqWSgc6ejycBL0LZurTjL=tzMXjSBEfHZWArFLk1-f50RUi0A@mail.gmail.com>
	<CFB4FEF6.78EE%johan.idren@dice.se>
Date: Wed, 4 Jun 2014 16:18:30 +0100
Message-ID: 
 <CAFqWSgcJOaBPpevF55wyU=oAJ_thBtpbtQymPUEyhr7q54yZvA@mail.gmail.com>
Subject: Re: memtable mem usage off by 10?
From: Benedict Elliott Smith <belliottsmith@datastax.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=089e0149cef61b72e804fb04243d

--089e0149cef61b72e804fb04243d
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

In that case I would assume the problem is that for some reason JAMM is
failing to load, and so the liveRatio it would ordinarily calculate is
defaulting to 10 - are you using the bundled cassandra launch scripts?


On 4 June 2014 15:51, Idr=C3=A9n, Johan <Johan.Idren@dice.se> wrote:

>  I wasn=E2=80=99t supplying it, I was assuming it was using the default. =
It does
> not exist in my config file. Sorry for the confusion.
>
>
>
>   From: Benedict Elliott Smith <belliottsmith@datastax.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Wednesday 4 June 2014 16:36
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>
> Subject: Re: memtable mem usage off by 10?
>
>    Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry,
>> I was going by the documentation. It claims that the property is around =
in
>> 2.0.
>
> But something else is wrong, as Cassandra will crash if you supply an
> invalid property, implying it's not sourcing the config file you're using=
.
>  I'm afraid I don't have the context for why it was removed, but it
> happened as part of the 2.0 release.
>
>>
>
> On 4 June 2014 13:59, Jack Krupansky <jack@basetechnology.com> wrote:
>
>>   Yeah, it is in the doc:
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/configurat=
ion/configCassandra_yaml_r.html
>>
>> And I don=E2=80=99t find a Jira issue mentioning it being removed, so...=
 what=E2=80=99s
>> the full story there?!
>>
>> -- Jack Krupansky
>>
>>  *From:* Idr=C3=A9n, Johan <Johan.Idren@dice.se>
>> *Sent:* Wednesday, June 4, 2014 8:26 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* RE: memtable mem usage off by 10?
>>
>>
>> Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I
>> was going by the documentation. It claims that the property is around in
>> 2.0.
>>
>>
>>
>> If we skip that, part of my reply still makes sense:
>>
>>
>>
>> Having memtable_total_size_in_mb set to 20480, memtables are flushed at =
a
>> reported value of ~2GB.
>>
>>
>>
>> With a constant overhead of ~10x, as suggested, this would mean that it
>> used 20GB, which is 2x the size of the heap.
>>
>>
>>
>> That shouldn't work. According to the OS, cassandra doesn't use more tha=
n
>> ~11-12GB.
>>
>>
>>  ------------------------------
>> *From:* Benedict Elliott Smith <belliottsmith@datastax.com>
>> *Sent:* Wednesday, June 4, 2014 2:07 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: memtable mem usage off by 10?
>>
>>  I'm confused: there is no flush_largest_memtables_at property in C* 2.0=
?
>>
>>
>> On 4 June 2014 12:55, Idr=C3=A9n, Johan <Johan.Idren@dice.se> wrote:
>>
>>>  Ok, so the overhead is a constant modifier, right.
>>>
>>>
>>>
>>> The 3x I arrived at with the following assumptions:
>>>
>>>
>>>
>>> heap is 10GB
>>>
>>> Default memory for memtable usage is 1/4 of heap in c* 2.0
>>>  max memory used for memtables is 2,5GB (10/4)
>>>
>>> flush_largest_memtables_at is 0.75
>>>
>>> flush largest memtables when memtables use 7,5GB (3/4 of heap, 3x of th=
e
>>> default)
>>>
>>>
>>>
>>> With an overhead of 10x, it makes sense that my memtable is flushed whe=
n
>>> the jmx data says it is at ~250MB, ie 2,5GB, ie 1/4 of the heap
>>>
>>>
>>>
>>> After I've set the memtable_total_size_in_mb to a value larger than
>>> 7,5GB, it should still not go over 7,5GB on account of
>>> flush_largest_memtables_at, 3/4 the heap
>>>
>>>
>>>
>>> So I would expect to see memtables flushed to disk after they're being
>>> reportedly at around 750MB.
>>>
>>>
>>>
>>> Having memtable_total_size_in_mb set to 20480, memtables are flushed at
>>> a reported value of ~2GB.
>>>
>>>
>>>
>>> With a constant overhead, this would mean that it used 20GB, which is 2=
x
>>> the size of the heap, instead of 3/4 of the heap as it should be if
>>> flush_largest_memtables_at was being respected.
>>>
>>>
>>>
>>> This shouldn't be possible.
>>>
>>>
>>>  ------------------------------
>>>  *From:* Benedict Elliott Smith <belliottsmith@datastax.com>
>>>  *Sent:* Wednesday, June 4, 2014 1:19 PM
>>>
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: memtable mem usage off by 10?
>>>
>>>   Unfortunately it looks like the heap utilisation of memtables was not
>>> exposed in earlier versions, because they only maintained an estimate.
>>>
>>> The overhead scales linearly with the amount of data in your memtables
>>> (assuming the size of each cell is approx. constant).
>>>
>>> flush_largest_memtables_at is an independent setting to
>>> memtable_total_space_in_mb, and generally has little effect. Ordinarily
>>> sstable flushes are triggered by hitting the memtable_total_space_in_mb
>>> limit. I'm afraid I don't follow where your 3x comes from?
>>>
>>>
>>> On 4 June 2014 12:04, Idr=C3=A9n, Johan <Johan.Idren@dice.se> wrote:
>>>
>>>>  Aha, ok. Thanks.
>>>>
>>>>
>>>>
>>>> Trying to understand what my cluster is doing:
>>>>
>>>>
>>>>
>>>> cassandra.db.memtable_data_size only gets me the actual data but not
>>>> the memtable heap memory usage. Is there a way to check for heap memor=
y
>>>> usage?
>>>>
>>>>
>>>>  I would expect to hit the flush_largest_memtables_at value, and this
>>>> would be what causes the memtable flush to sstable then? By default 0.=
75?
>>>>
>>>>
>>>>  Then I would expect the amount of memory to be used to be maximum ~3x
>>>> of what I was seeing when I hadn't set memtable_total_space_in_mb (1/4=
 by
>>>> default, max 3/4 before a flush), instead of close to 10x (250mb vs 2g=
b).
>>>>
>>>>
>>>> This is of course assuming that the overhead scales linearly with the
>>>> amount of data in my table, we're using one table with three cells in =
this
>>>> case. If it hardly increases at all, then I'll give up I guess :)
>>>>
>>>> At least until 2.1.0 comes out and I can compare.
>>>>
>>>>
>>>>  BR
>>>>
>>>> Johan
>>>>
>>>>
>>>>  ------------------------------
>>>>  *From:* Benedict Elliott Smith <belliottsmith@datastax.com>
>>>>  *Sent:* Wednesday, June 4, 2014 12:33 PM
>>>>
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Re: memtable mem usage off by 10?
>>>>
>>>>   These measurements tell you the amount of user data stored in the
>>>> memtables, not the amount of heap used to store it, so the same applie=
s.
>>>>
>>>>
>>>> On 4 June 2014 11:04, Idr=C3=A9n, Johan <Johan.Idren@dice.se> wrote:
>>>>
>>>>>  I'm not measuring memtable size by looking at the sstables on disk,
>>>>> no. I'm looking through the JMX data. So I would believe (or hope) th=
at I'm
>>>>> getting relevant data.
>>>>>
>>>>>
>>>>>
>>>>> If I have a heap of 10GB and set the memtable usage to 20GB, I would
>>>>> expect to hit other problems, but I'm not seeing memory usage over 10=
GB for
>>>>> the heap, and the machine (which has ~30gb of memory) is showing ~10G=
B
>>>>> free, with ~12GB used by cassandra, the rest in caches.
>>>>>
>>>>>
>>>>>
>>>>> Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it's not
>>>>> idling.
>>>>>
>>>>>
>>>>>
>>>>> BR
>>>>>
>>>>> Johan
>>>>>
>>>>>
>>>>>  ------------------------------
>>>>> *From:* Benedict Elliott Smith <belliottsmith@datastax.com>
>>>>> *Sent:* Wednesday, June 4, 2014 11:56 AM
>>>>> *To:* user@cassandra.apache.org
>>>>> *Subject:* Re: memtable mem usage off by 10?
>>>>>
>>>>>   If you are storing small values in your columns, the object
>>>>> overhead is very substantial. So what is 400Mb on disk may well be 4G=
b in
>>>>> memtables, so if you are measuring the memtable size by the resulting
>>>>> sstable size, you are not getting an accurate picture. This overhead =
has
>>>>> been reduced by about 90% in the upcoming 2.1 release, through ticket=
s
>>>>> 6271 <https://issues.apache.org/jira/browse/CASSANDRA-6271>, 6689
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-6689> and 6694
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-6694>.
>>>>>
>>>>>
>>>>> On 4 June 2014 10:49, Idr=C3=A9n, Johan <Johan.Idren@dice.se> wrote:
>>>>>
>>>>>>  Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm seeing some strange behavior of the memtables, both in 1.2.13 an=
d
>>>>>> 2.0.7, basically it looks like it's using 10x less memory than it sh=
ould
>>>>>> based on the documentation and options.
>>>>>>
>>>>>>
>>>>>>
>>>>>> 10GB heap for both clusters.
>>>>>>
>>>>>> 1.2.x should use 1/3 of the heap for memtables, but it uses max
>>>>>> ~300mb before flushing
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2.0.7, same but 1/4 and ~250mb
>>>>>>
>>>>>>
>>>>>>
>>>>>> In the 2.0.7 cluster I set the memtable_total_space_in_mb to 4096,
>>>>>> which then allowed cassandra to use up to ~400mb for memtables...
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm now running with 20480 for memtable_total_space_in_mb and
>>>>>> cassandra is using ~2GB for memtables.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Soo, off by 10 somewhere? Has anyone else seen this? Can't find a
>>>>>> JIRA for any bug connected to this.
>>>>>>
>>>>>> java 1.7.0_55, JNA 4.1.0 (for the 2.0 cluster)
>>>>>>
>>>>>>
>>>>>>
>>>>>> BR
>>>>>>
>>>>>> Johan
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

--089e0149cef61b72e804fb04243d
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">In that case I would assume the problem is that for some r=
eason JAMM is failing to load, and so the liveRatio it would ordinarily cal=
culate is defaulting to 10 - are you using the bundled cassandra launch scr=
ipts?=C2=A0</div>
<div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On 4 June 201=
4 15:51, Idr=C3=A9n, Johan <span dir=3D"ltr">&lt;<a href=3D"mailto:Johan.Id=
ren@dice.se" target=3D"_blank">Johan.Idren@dice.se</a>&gt;</span> wrote:<br=
><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1=
px #ccc solid;padding-left:1ex">


<div style=3D"word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-fam=
ily:Calibri,sans-serif">
<div>I wasn=E2=80=99t supplying it, I was assuming it was using the default=
. It does not exist in my config file. Sorry for the confusion.</div>
<div><br>
<br>
</div>
<div><br>
</div>
<span>
<div style=3D"font-family:Calibri;font-size:11pt;text-align:left;color:blac=
k;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADD=
ING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:me=
dium none;PADDING-TOP:3pt">
<div class=3D"">
<span style=3D"font-weight:bold">From: </span>Benedict Elliott Smith &lt;<a=
 href=3D"mailto:belliottsmith@datastax.com" target=3D"_blank">belliottsmith=
@datastax.com</a>&gt;<br>
</div><span style=3D"font-weight:bold">Reply-To: </span>&quot;<a href=3D"ma=
ilto:user@cassandra.apache.org" target=3D"_blank">user@cassandra.apache.org=
</a>&quot; &lt;<a href=3D"mailto:user@cassandra.apache.org" target=3D"_blan=
k">user@cassandra.apache.org</a>&gt;<br>

<span style=3D"font-weight:bold">Date: </span>Wednesday 4 June 2014 16:36<b=
r>
<span style=3D"font-weight:bold">To: </span>&quot;<a href=3D"mailto:user@ca=
ssandra.apache.org" target=3D"_blank">user@cassandra.apache.org</a>&quot; &=
lt;<a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">user@cass=
andra.apache.org</a>&gt;<div>
<div class=3D"h5"><br>
<span style=3D"font-weight:bold">Subject: </span>Re: memtable mem usage off=
 by 10?<br>
</div></div></div><div><div class=3D"h5">
<div><br>
</div>
<div>
<div>
<div dir=3D"ltr">
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
<span style=3D"color:rgb(0,0,0);font-family:Calibri,Arial,Helvetica,sans-se=
rif;font-size:16px">Oh, well ok that explains why I&#39;m not seeing a flus=
h at 750MB. Sorry, I was going by the documentation. It claims that the pro=
perty is around in 2.0.</span></blockquote>

<p dir=3D"ltr">But something else is wrong, as Cassandra will crash if you =
supply an invalid property, implying it&#39;s not sourcing the config file =
you&#39;re using.<br>
</p>
<div>I&#39;m afraid I don&#39;t have the context for why it was removed, bu=
t it happened as part of the 2.0 release.</div>
<div class=3D"gmail_quote">
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
<div dir=3D"ltr">
<div style=3D"font-size:12pt;color:rgb(0,0,0);font-family:Calibri,Arial,Hel=
vetica,sans-serif">
<div style=3D"color:rgb(40,40,40)">
<div></div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
<div class=3D"gmail_extra"><br>
<br>
<div class=3D"gmail_quote">On 4 June 2014 13:59, Jack Krupansky <span dir=
=3D"ltr">&lt;<a href=3D"mailto:jack@basetechnology.com" target=3D"_blank">j=
ack@basetechnology.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div dir=3D"ltr">
<div dir=3D"ltr">
<div style=3D"FONT-SIZE:12pt;FONT-FAMILY:&#39;Calibri&#39;;COLOR:#000000">
<div>Yeah, it is in the doc:</div>
<div><a title=3D"http://www.datastax.com/documentation/cassandra/2.0/cassan=
dra/configuration/configCassandra_yaml_r.html" href=3D"http://www.datastax.=
com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yam=
l_r.html" target=3D"_blank">http://www.datastax.com/documentation/cassandra=
/2.0/cassandra/configuration/configCassandra_yaml_r.html</a></div>

<div>=C2=A0</div>
<div>And I don=E2=80=99t find a Jira issue mentioning it being removed, so.=
.. what=E2=80=99s the full story there?!</div>
<div>=C2=A0</div>
<div style=3D"FONT-SIZE:12pt;FONT-FAMILY:&#39;Calibri&#39;;COLOR:#000000">-=
- Jack Krupansky</div>
<div style=3D"FONT-SIZE:small;TEXT-DECORATION:none;FONT-FAMILY:&quot;Calibr=
i&quot;;FONT-WEIGHT:normal;COLOR:#000000;FONT-STYLE:normal;DISPLAY:inline">
<div style=3D"FONT:10pt tahoma">
<div>=C2=A0</div>
<div style=3D"BACKGROUND:#f5f5f5">
<div><b>From:</b> <a title=3D"Johan.Idren@dice.se" href=3D"mailto:Johan.Idr=
en@dice.se" target=3D"_blank">
Idr=C3=A9n, Johan</a> </div>
<div><b>Sent:</b> Wednesday, June 4, 2014 8:26 AM</div>
<div><b>To:</b> <a title=3D"user@cassandra.apache.org" href=3D"mailto:user@=
cassandra.apache.org" target=3D"_blank">
user@cassandra.apache.org</a> </div>
<div><b>Subject:</b> RE: memtable mem usage off by 10?</div>
</div>
</div>
<div>=C2=A0</div>
</div>
<div>
<div>
<div style=3D"FONT-SIZE:small;TEXT-DECORATION:none;FONT-FAMILY:&quot;Calibr=
i&quot;;FONT-WEIGHT:normal;COLOR:#000000;FONT-STYLE:normal;DISPLAY:inline">
<div style=3D"FONT-SIZE:12pt;FONT-FAMILY:calibri,arial,helvetica,sans-serif=
;COLOR:#000000;BACKGROUND-COLOR:#ffffff">
<p>Oh, well ok that explains why I&#39;m not seeing a flush at 750MB. Sorry=
, I was going by the documentation. It claims that the property is around i=
n 2.0.<br>
</p>
<p>=C2=A0</p>
<p>If we skip that, part of my reply still makes sense:<br>
</p>
<p>=C2=A0</p>
<p style=3D"FONT-SIZE:16px;FONT-FAMILY:calibri,arial,helvetica,sans-serif">=
Having memtable_total_size_in_mb set to 20480, memtables are flushed at a r=
eported value of ~2GB.
<br>
</p>
<p style=3D"FONT-SIZE:16px;FONT-FAMILY:calibri,arial,helvetica,sans-serif">=
=C2=A0</p>
<p style=3D"FONT-SIZE:16px;FONT-FAMILY:calibri,arial,helvetica,sans-serif">=
With a constant overhead of ~10x, as suggested, this would mean that it use=
d 20GB, which is 2x the size of the heap.<br>
</p>
<p style=3D"FONT-SIZE:16px;FONT-FAMILY:calibri,arial,helvetica,sans-serif">=
=C2=A0</p>
<p style=3D"FONT-SIZE:16px;FONT-FAMILY:calibri,arial,helvetica,sans-serif">=
That shouldn&#39;t work. According to the OS, cassandra doesn&#39;t use mor=
e than ~11-12GB.<br>
</p>
<p>=C2=A0</p>
<div style=3D"COLOR:rgb(40,40,40)">
<hr style=3D"WIDTH:98%;DISPLAY:inline-block">
<div dir=3D"ltr"><font style=3D"FONT-SIZE:11pt" color=3D"#000000" face=3D"C=
alibri,sans-serif"><b>From:</b> Benedict Elliott Smith &lt;<a href=3D"mailt=
o:belliottsmith@datastax.com" target=3D"_blank">belliottsmith@datastax.com<=
/a>&gt;<br>

<b>Sent:</b> Wednesday, June 4, 2014 2:07 PM<br>
<b>To:</b> <a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">u=
ser@cassandra.apache.org</a><br>
<b>Subject:</b> Re: memtable mem usage off by 10?</font>
<div>=C2=A0</div>
</div>
<div>
<div dir=3D"ltr">I&#39;m confused: there is no flush_largest_memtables_at p=
roperty in C* 2.0?</div>
<div class=3D"gmail_extra"><br>
<br>
<div class=3D"gmail_quote">On 4 June 2014 12:55, Idr=C3=A9n, Johan <span di=
r=3D"ltr">&lt;<a href=3D"mailto:Johan.Idren@dice.se" target=3D"_blank">Joha=
n.Idren@dice.se</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"PADDING-LEFT:1ex;MARGIN:0px 0px =
0px 0.8ex;BORDER-LEFT:#ccc 1px solid">
<div dir=3D"ltr">
<div style=3D"FONT-SIZE:12pt;FONT-FAMILY:calibri,arial,helvetica,sans-serif=
;COLOR:#000000;BACKGROUND-COLOR:#ffffff">
<p>Ok, so the overhead is a constant modifier, right.<br>
</p>
<p>=C2=A0</p>
<p>The 3x I arrived at with the following assumptions:<br>
</p>
<p>=C2=A0</p>
<p><span style=3D"FONT-SIZE:12pt">h</span><span style=3D"FONT-SIZE:12pt">ea=
p is 10GB</span><br>
</p>
<p style=3D"FONT-SIZE:16px;FONT-FAMILY:calibri,arial,helvetica,sans-serif">=
Default memory for memtable usage is 1/4 of heap in c* 2.0<br>
</p>
<div><span style=3D"FONT-SIZE:12pt">max memory used for memtables is 2,5GB =
(10/4)</span></div>
<p>flush_largest_memtables_at is 0.75<br>
</p>
<p>flush largest memtables when memtables use 7,5GB (3/4 of heap, 3x of the=
 default)<br>
</p>
<p>=C2=A0</p>
<p>With an overhead of 10x, it makes sense that my memtable is flushed when=
 the jmx data says it is at ~250MB, ie 2,5GB, ie 1/4 of the heap<br>
</p>
<p>=C2=A0</p>
<p>After I&#39;ve set the memtable_total_size_in_mb to a value larger than =
7,5GB, it should still not go over 7,5GB on account of flush_largest_memtab=
les_at, 3/4 the heap<br>
</p>
<p>=C2=A0</p>
<p>So I would expect to see memtables flushed to disk after they&#39;re bei=
ng reportedly at around 750MB.<br>
</p>
<p>=C2=A0</p>
<p>Having memtable_total_size_in_mb set to 20480, memtables are flushed at =
a reported value of ~2GB.
<br>
</p>
<p>=C2=A0</p>
<p>With a constant overhead, this would mean that it used 20GB, which is 2x=
 the size of the heap, instead of 3/4 of the heap as it should be if flush_=
largest_memtables_at was being respected.<br>
</p>
<p>=C2=A0</p>
<p>This shouldn&#39;t be possible.<br>
</p>
<p>=C2=A0</p>
<div style=3D"COLOR:rgb(40,40,40)">
<hr style=3D"WIDTH:98%;DISPLAY:inline-block">
<div dir=3D"ltr"><font style=3D"FONT-SIZE:11pt" color=3D"#000000" face=3D"C=
alibri,sans-serif">
<div><b>From:</b> Benedict Elliott Smith &lt;<a href=3D"mailto:belliottsmit=
h@datastax.com" target=3D"_blank">belliottsmith@datastax.com</a>&gt;<br>
</div>
<b>Sent:</b> Wednesday, June 4, 2014 1:19 PM
<div>
<div><br>
<b>To:</b> <a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">u=
ser@cassandra.apache.org</a><br>
<b>Subject:</b> Re: memtable mem usage off by 10?</div>
</div>
</font>
<div>=C2=A0</div>
</div>
<div>
<div>
<div>
<div dir=3D"ltr">Unfortunately it looks like the heap utilisation of memtab=
les was not exposed in earlier versions, because they only maintained an es=
timate.
<div>=C2=A0</div>
<div>The overhead scales linearly with the amount of data in your memtables=
 (assuming the size of each cell is approx. constant).
</div>
<div>=C2=A0</div>
<div>flush_largest_memtables_at is an independent setting to memtable_total=
_space_in_mb, and generally has little effect. Ordinarily sstable flushes a=
re triggered by hitting the memtable_total_space_in_mb limit. I&#39;m afrai=
d I don&#39;t follow where your 3x comes
 from?<br>
</div>
</div>
<div class=3D"gmail_extra"><br>
<br>
<div class=3D"gmail_quote">On 4 June 2014 12:04, Idr=C3=A9n, Johan <span di=
r=3D"ltr">&lt;<a href=3D"mailto:Johan.Idren@dice.se" target=3D"_blank">Joha=
n.Idren@dice.se</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"PADDING-LEFT:1ex;MARGIN:0px 0px =
0px 0.8ex;BORDER-LEFT:#ccc 1px solid">
<div dir=3D"ltr">
<div style=3D"FONT-SIZE:12pt;FONT-FAMILY:calibri,arial,helvetica,sans-serif=
;COLOR:#000000;BACKGROUND-COLOR:#ffffff">
<p>Aha, ok. Thanks.<br>
</p>
<p>=C2=A0</p>
<p>Trying to understand what my cluster is doing:<br>
</p>
<p>=C2=A0</p>
<p><span style=3D"FONT-SIZE:12pt">cassandra.db.memtable_data_size</span><sp=
an style=3D"FONT-SIZE:12pt"> only gets me the actual data but not the memta=
ble heap memory usage. Is there a way to check for heap memory usage?</span=
><br>

</p>
<p><span style=3D"FONT-SIZE:12pt"><br>
</span></p>
<p><span style=3D"FONT-SIZE:12pt">I would expect to hit the flush_largest_m=
emtables_at value, and this would be what causes the memtable flush to ssta=
ble then? By default 0.75?</span></p>
<p><span style=3D"FONT-SIZE:12pt"><br>
</span></p>
<p><span style=3D"FONT-SIZE:12pt">Then I would expect the amount of memory =
to be used to be maximum ~3x of what I was seeing when I hadn&#39;t set mem=
table_total_space_in_mb (1/4 by default, max 3/4 before a flush), instead o=
f close to 10x (250mb vs 2gb).</span></p>

<p><span style=3D"FONT-SIZE:12pt"><br>
This is of course assuming that the overhead scales linearly with the amoun=
t of data in my table, we&#39;re using one table with three cells in this c=
ase. If it hardly increases at all, then I&#39;ll give up I guess :)</span>=
</p>

<p><span style=3D"FONT-SIZE:12pt">At least until 2.1.0 comes out and I can =
compare.</span></p>
<p><span style=3D"FONT-SIZE:12pt"><br>
</span></p>
<p><span style=3D"FONT-SIZE:12pt">BR</span></p>
<p><span style=3D"FONT-SIZE:12pt">Johan</span></p>
<p>=C2=A0</p>
<div style=3D"COLOR:rgb(40,40,40)">
<hr style=3D"WIDTH:98%;DISPLAY:inline-block">
<div dir=3D"ltr"><font style=3D"FONT-SIZE:11pt" color=3D"#000000" face=3D"C=
alibri,sans-serif">
<div><b>From:</b> Benedict Elliott Smith &lt;<a href=3D"mailto:belliottsmit=
h@datastax.com" target=3D"_blank">belliottsmith@datastax.com</a>&gt;<br>
</div>
<b>Sent:</b> Wednesday, June 4, 2014 12:33 PM
<div>
<div><br>
<b>To:</b> <a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">u=
ser@cassandra.apache.org</a><br>
<b>Subject:</b> Re: memtable mem usage off by 10?</div>
</div>
</font>
<div>=C2=A0</div>
</div>
<div>
<div>
<div>
<div dir=3D"ltr">These measurements tell you the amount of user data stored=
 in the memtables, not the amount of heap used to store it, so the same app=
lies.</div>
<div class=3D"gmail_extra"><br>
<br>
<div class=3D"gmail_quote">On 4 June 2014 11:04, Idr=C3=A9n, Johan <span di=
r=3D"ltr">&lt;<a href=3D"mailto:Johan.Idren@dice.se" target=3D"_blank">Joha=
n.Idren@dice.se</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"PADDING-LEFT:1ex;MARGIN:0px 0px =
0px 0.8ex;BORDER-LEFT:#ccc 1px solid">
<div dir=3D"ltr">
<div style=3D"FONT-SIZE:12pt;FONT-FAMILY:calibri,arial,helvetica,sans-serif=
;COLOR:#000000;BACKGROUND-COLOR:#ffffff">
<p>I&#39;m not measuring memtable size by looking at the sstables on disk, =
no. I&#39;m looking through the JMX data. So I would believe (or hope) that=
 I&#39;m getting relevant data.
<br>
</p>
<p>=C2=A0</p>
<p>If I have a heap of 10GB and set the memtable usage to 20GB, I would exp=
ect to hit other problems, but I&#39;m not seeing memory usage over 10GB fo=
r the heap, and the machine (which has ~30gb of memory) is showing ~10GB fr=
ee, with ~12GB used by cassandra, the
 rest in caches.<br>
</p>
<p>=C2=A0</p>
<p>Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it&#39;s no=
t idling.<br>
</p>
<p>=C2=A0</p>
<p>BR<br>
</p>
<p>Johan<br>
</p>
<p>=C2=A0</p>
<div style=3D"COLOR:rgb(40,40,40)">
<hr style=3D"WIDTH:98%;DISPLAY:inline-block">
<div dir=3D"ltr"><font style=3D"FONT-SIZE:11pt" color=3D"#000000" face=3D"C=
alibri,sans-serif"><b>From:</b> Benedict Elliott Smith &lt;<a href=3D"mailt=
o:belliottsmith@datastax.com" target=3D"_blank">belliottsmith@datastax.com<=
/a>&gt;<br>

<b>Sent:</b> Wednesday, June 4, 2014 11:56 AM<br>
<b>To:</b> <a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">u=
ser@cassandra.apache.org</a><br>
<b>Subject:</b> Re: memtable mem usage off by 10?</font>
<div>=C2=A0</div>
</div>
<div>
<div>
<div>
<div dir=3D"ltr">If you are storing small values in your columns, the objec=
t overhead is very substantial. So what is 400Mb on disk may well be 4Gb in=
 memtables, so if you are measuring the memtable size by the resulting ssta=
ble size, you are not getting an accurate
 picture. This overhead has been reduced by about 90% in the upcoming 2.1 r=
elease, through tickets
<a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-6271" target=3D"=
_blank">6271</a>,
<a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-6689" target=3D"=
_blank">6689</a> and
<a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-6694" target=3D"=
_blank">6694</a>.</div>
<div class=3D"gmail_extra"><br>
<br>
<div class=3D"gmail_quote">On 4 June 2014 10:49, Idr=C3=A9n, Johan <span di=
r=3D"ltr">&lt;<a href=3D"mailto:Johan.Idren@dice.se" target=3D"_blank">Joha=
n.Idren@dice.se</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"PADDING-LEFT:1ex;MARGIN:0px 0px =
0px 0.8ex;BORDER-LEFT:#ccc 1px solid">
<div dir=3D"ltr">
<div style=3D"FONT-SIZE:12pt;FONT-FAMILY:calibri,arial,helvetica,sans-serif=
;COLOR:#000000;BACKGROUND-COLOR:#ffffff">
<p>Hi,<br>
</p>
<p>=C2=A0</p>
<p>I&#39;m seeing some strange behavior of the memtables, both in 1.2.13 an=
d 2.0.7, basically it looks like it&#39;s using 10x less memory than it sho=
uld based on the documentation and options.<br>
</p>
<p>=C2=A0</p>
<p>10GB heap for both clusters.<br>
</p>
<p>1.2.x should use 1/3 of the heap for memtables, but it uses max ~300mb b=
efore flushing<br>
</p>
<p>=C2=A0</p>
<p>2.0.7, same but 1/4 and ~250mb<br>
</p>
<p>=C2=A0</p>
<p>In the 2.0.7 cluster I set the memtable_total_space_in_mb to 4096, which=
 then allowed cassandra to use up to ~400mb for memtables...<br>
</p>
<p>=C2=A0</p>
<p>I&#39;m now running with 20480 for memtable_total_space_in_mb and cassan=
dra is using ~2GB for memtables.<br>
</p>
<p>=C2=A0</p>
<p>Soo, off by 10 somewhere? Has anyone else seen this? Can&#39;t find a JI=
RA for any bug connected to this.<br>
</p>
<p>java 1.7.0_55, JNA 4.1.0 (for the 2.0 cluster)<br>
</p>
<p>=C2=A0</p>
<p>BR<span><font color=3D"#888888"><br>
</font></span></p>
<span><font color=3D"#888888">
<p>Johan<br>
</p>
</font></span></div>
</div>
</blockquote>
</div>
<div>=C2=A0</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<div>=C2=A0</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<div>=C2=A0</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<div>=C2=A0</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div></div></span>
</div>

</blockquote></div><br></div>

--089e0149cef61b72e804fb04243d--