Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of belliottsmith@datastax.com
 designates 209.85.223.177 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <1401879841309.12539@dice.se>
References: <1401875371141.25081@dice.se>
	<CAFqWSgeZDXCzkFitYQ82jxwcxFF0MKAqu3hyu5GoFX6M9TZEzQ@mail.gmail.com>
	<1401876241947.15837@dice.se>
	<CAFqWSgeSbwgLhrbzuCWfsLoXZ-VCQfJwLOwv33Bd=yrMOUe7fg@mail.gmail.com>
	<1401879841309.12539@dice.se>
Date: Wed, 4 Jun 2014 12:19:16 +0100
Message-ID: 
 <CAFqWSgd10VJTZ9uhjGf604w8P+LiyHGhcDkOMPAtKM=_wAzdhw@mail.gmail.com>
Subject: Re: memtable mem usage off by 10?
From: Benedict Elliott Smith <belliottsmith@datastax.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a113441bc89776204fb00cc56

--001a113441bc89776204fb00cc56
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Unfortunately it looks like the heap utilisation of memtables was not
exposed in earlier versions, because they only maintained an estimate.

The overhead scales linearly with the amount of data in your memtables
(assuming the size of each cell is approx. constant).

flush_largest_memtables_at is an independent setting to
memtable_total_space_in_mb, and generally has little effect. Ordinarily
sstable flushes are triggered by hitting the memtable_total_space_in_mb
limit. I'm afraid I don't follow where your 3x comes from?


On 4 June 2014 12:04, Idr=C3=A9n, Johan <Johan.Idren@dice.se> wrote:

>  Aha, ok. Thanks.
>
>
>  Trying to understand what my cluster is doing:
>
>
>  cassandra.db.memtable_data_size only gets me the actual data but not the
> memtable heap memory usage. Is there a way to check for heap memory usage=
?
>
>
>  I would expect to hit the flush_largest_memtables_at value, and this
> would be what causes the memtable flush to sstable then? By default 0.75?
>
>
>  Then I would expect the amount of memory to be used to be maximum ~3x of
> what I was seeing when I hadn't set memtable_total_space_in_mb (1/4 by
> default, max 3/4 before a flush), instead of close to 10x (250mb vs 2gb).
>
>
> This is of course assuming that the overhead scales linearly with the
> amount of data in my table, we're using one table with three cells in thi=
s
> case. If it hardly increases at all, then I'll give up I guess :)
>
> At least until 2.1.0 comes out and I can compare.
>
>
>  BR
>
> Johan
>
>
>  ------------------------------
> *From:* Benedict Elliott Smith <belliottsmith@datastax.com>
> *Sent:* Wednesday, June 4, 2014 12:33 PM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: memtable mem usage off by 10?
>
>  These measurements tell you the amount of user data stored in the
> memtables, not the amount of heap used to store it, so the same applies.
>
>
> On 4 June 2014 11:04, Idr=C3=A9n, Johan <Johan.Idren@dice.se> wrote:
>
>>  I'm not measuring memtable size by looking at the sstables on disk, no.
>> I'm looking through the JMX data. So I would believe (or hope) that I'm
>> getting relevant data.
>>
>>
>>  If I have a heap of 10GB and set the memtable usage to 20GB, I would
>> expect to hit other problems, but I'm not seeing memory usage over 10GB =
for
>> the heap, and the machine (which has ~30gb of memory) is showing ~10GB
>> free, with ~12GB used by cassandra, the rest in caches.
>>
>>
>>  Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it's not
>> idling.
>>
>>
>>  BR
>>
>> Johan
>>
>>
>>  ------------------------------
>> *From:* Benedict Elliott Smith <belliottsmith@datastax.com>
>> *Sent:* Wednesday, June 4, 2014 11:56 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: memtable mem usage off by 10?
>>
>>   If you are storing small values in your columns, the object overhead
>> is very substantial. So what is 400Mb on disk may well be 4Gb in memtabl=
es,
>> so if you are measuring the memtable size by the resulting sstable size,
>> you are not getting an accurate picture. This overhead has been reduced =
by
>> about 90% in the upcoming 2.1 release, through tickets 6271
>> <https://issues.apache.org/jira/browse/CASSANDRA-6271>, 6689
>> <https://issues.apache.org/jira/browse/CASSANDRA-6689> and 6694
>> <https://issues.apache.org/jira/browse/CASSANDRA-6694>.
>>
>>
>> On 4 June 2014 10:49, Idr=C3=A9n, Johan <Johan.Idren@dice.se> wrote:
>>
>>>  Hi,
>>>
>>>
>>>  I'm seeing some strange behavior of the memtables, both in 1.2.13 and
>>> 2.0.7, basically it looks like it's using 10x less memory than it shoul=
d
>>> based on the documentation and options.
>>>
>>>
>>>  10GB heap for both clusters.
>>>
>>> 1.2.x should use 1/3 of the heap for memtables, but it uses max ~300mb
>>> before flushing
>>>
>>>
>>>  2.0.7, same but 1/4 and ~250mb
>>>
>>>
>>>  In the 2.0.7 cluster I set the memtable_total_space_in_mb to 4096,
>>> which then allowed cassandra to use up to ~400mb for memtables...
>>>
>>>
>>>  I'm now running with 20480 for memtable_total_space_in_mb and
>>> cassandra is using ~2GB for memtables.
>>>
>>>
>>>  Soo, off by 10 somewhere? Has anyone else seen this? Can't find a JIRA
>>> for any bug connected to this.
>>>
>>> java 1.7.0_55, JNA 4.1.0 (for the 2.0 cluster)
>>>
>>>
>>>  BR
>>>
>>> Johan
>>>
>>
>>
>

--001a113441bc89776204fb00cc56
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Unfortunately it looks like the heap utilisation of memtab=
les was not exposed in earlier versions, because they only maintained an es=
timate.<div><br></div><div>The overhead scales linearly with the amount of =
data in your memtables (assuming the size of each cell is approx. constant)=
.=C2=A0</div>
<div><br></div><div>flush_largest_memtables_at is an independent setting to=
 memtable_total_space_in_mb, and generally has little effect. Ordinarily ss=
table flushes are triggered by hitting the memtable_total_space_in_mb limit=
. I&#39;m afraid I don&#39;t follow where your 3x comes from?<br>
</div></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">O=
n 4 June 2014 12:04, Idr=C3=A9n, Johan <span dir=3D"ltr">&lt;<a href=3D"mai=
lto:Johan.Idren@dice.se" target=3D"_blank">Johan.Idren@dice.se</a>&gt;</spa=
n> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">


<div dir=3D"ltr">
<div style=3D"font-size:12pt;color:#000000;background-color:#ffffff;font-fa=
mily:Calibri,Arial,Helvetica,sans-serif">
<p>Aha, ok. Thanks.<br>
</p>
<p><br>
</p>
<p>Trying to understand what my cluster is doing:<br>
</p>
<p><br>
</p>
<p><span style=3D"font-size:12pt">cassandra.db.memtable_data_size</span><sp=
an style=3D"font-size:12pt">=C2=A0only gets me the actual data but not the =
memtable heap memory usage. Is there a way to check for heap memory usage?<=
/span><br>

</p>
<p><span style=3D"font-size:12pt"><br>
</span></p>
<p><span style=3D"font-size:12pt">I would expect to hit the flush_largest_m=
emtables_at value, and this would be what causes the memtable=C2=A0flush to=
 sstable=C2=A0then? By default 0.75?</span></p>
<p><span style=3D"font-size:12pt"><br>
</span></p>
<p><span style=3D"font-size:12pt">Then I would expect the amount of memory =
to be used to be maximum=C2=A0~3x of what I was seeing when I hadn&#39;t se=
t memtable_total_space_in_mb=C2=A0(1/4 by default, max=C2=A03/4 before a fl=
ush), instead of close to 10x (250mb vs 2gb).</span></p>

<p><span style=3D"font-size:12pt"><br>
This is of course assuming that the overhead scales linearly with the amoun=
t of data in my table, we&#39;re using one table with three cells in this c=
ase. If it hardly increases at all, then I&#39;ll give up I guess :)</span>=
</p>

<p><span style=3D"font-size:12pt">At least until 2.1.0 comes out and I can =
compare.</span></p>
<p><span style=3D"font-size:12pt"><br>
</span></p>
<p><span style=3D"font-size:12pt">BR</span></p>
<p><span style=3D"font-size:12pt">Johan</span></p>
<p><br>
</p>
<div style=3D"color:rgb(40,40,40)">
<hr style=3D"display:inline-block;width:98%">
<div dir=3D"ltr"><font face=3D"Calibri, sans-serif" color=3D"#000000" style=
=3D"font-size:11pt"><div class=3D""><b>From:</b> Benedict Elliott Smith &lt=
;<a href=3D"mailto:belliottsmith@datastax.com" target=3D"_blank">belliottsm=
ith@datastax.com</a>&gt;<br>

</div><b>Sent:</b> Wednesday, June 4, 2014 12:33 PM<div><div class=3D"h5"><=
br>
<b>To:</b> <a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">u=
ser@cassandra.apache.org</a><br>
<b>Subject:</b> Re: memtable mem usage off by 10?</div></div></font>
<div>=C2=A0</div>
</div><div><div class=3D"h5">
<div>
<div dir=3D"ltr">These measurements tell you the amount of user data stored=
 in the memtables, not the amount of heap used to store it, so the same app=
lies.</div>
<div class=3D"gmail_extra"><br>
<br>
<div class=3D"gmail_quote">On 4 June 2014 11:04, Idr=C3=A9n, Johan <span di=
r=3D"ltr">&lt;<a href=3D"mailto:Johan.Idren@dice.se" target=3D"_blank">Joha=
n.Idren@dice.se</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div dir=3D"ltr">
<div style=3D"font-size:12pt;color:#000000;background-color:#ffffff;font-fa=
mily:Calibri,Arial,Helvetica,sans-serif">
<p>I&#39;m not measuring memtable size by looking at the sstables on disk, =
no. I&#39;m looking through the JMX data. So I would believe (or hope) that=
 I&#39;m getting relevant data.=C2=A0<br>
</p>
<p><br>
</p>
<p>If I have a heap of 10GB and set the memtable usage to 20GB, I would exp=
ect to hit other problems, but I&#39;m not seeing memory usage over 10GB fo=
r the heap, and the machine (which has ~30gb of memory) is showing ~10GB fr=
ee, with ~12GB used by cassandra, the
 rest in caches.<br>
</p>
<p><br>
</p>
<p>Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it&#39;s no=
t idling.<br>
</p>
<p><br>
</p>
<p>BR<br>
</p>
<p>Johan<br>
</p>
<p><br>
</p>
<div style=3D"color:rgb(40,40,40)">
<hr style=3D"display:inline-block;width:98%">
<div dir=3D"ltr"><font face=3D"Calibri, sans-serif" color=3D"#000000" style=
=3D"font-size:11pt"><b>From:</b> Benedict Elliott Smith &lt;<a href=3D"mail=
to:belliottsmith@datastax.com" target=3D"_blank">belliottsmith@datastax.com=
</a>&gt;<br>

<b>Sent:</b> Wednesday, June 4, 2014 11:56 AM<br>
<b>To:</b> <a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">u=
ser@cassandra.apache.org</a><br>
<b>Subject:</b> Re: memtable mem usage off by 10?</font>
<div>=C2=A0</div>
</div>
<div>
<div>
<div>
<div dir=3D"ltr">If you are storing small values in your columns, the objec=
t overhead is very substantial. So what is 400Mb on disk may well be 4Gb in=
 memtables, so if you are measuring the memtable size by the resulting ssta=
ble size, you are not getting an accurate
 picture. This overhead has been reduced by about 90% in the upcoming 2.1 r=
elease, through tickets
<a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-6271" target=3D"=
_blank">6271</a>,
<a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-6689" target=3D"=
_blank">6689</a>=C2=A0and=C2=A0<a href=3D"https://issues.apache.org/jira/br=
owse/CASSANDRA-6694" target=3D"_blank">6694</a>.</div>
<div class=3D"gmail_extra"><br>
<br>
<div class=3D"gmail_quote">On 4 June 2014 10:49, Idr=C3=A9n, Johan <span di=
r=3D"ltr">&lt;<a href=3D"mailto:Johan.Idren@dice.se" target=3D"_blank">Joha=
n.Idren@dice.se</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div dir=3D"ltr">
<div style=3D"font-size:12pt;color:#000000;background-color:#ffffff;font-fa=
mily:Calibri,Arial,Helvetica,sans-serif">
<p>Hi,<br>
</p>
<p><br>
</p>
<p>I&#39;m seeing some strange behavior of the memtables, both in 1.2.13 an=
d 2.0.7, basically it looks like it&#39;s using 10x less memory than it sho=
uld based on the documentation and options.<br>
</p>
<p><br>
</p>
<p>10GB heap for both clusters.<br>
</p>
<p>1.2.x should use 1/3 of the heap for memtables, but it uses max ~300mb b=
efore flushing<br>
</p>
<p><br>
</p>
<p>2.0.7, same but 1/4 and ~250mb<br>
</p>
<p><br>
</p>
<p>In the 2.0.7 cluster I set the=C2=A0memtable_total_space_in_mb to 4096, =
which then allowed cassandra to use up to ~400mb for memtables...<br>
</p>
<p><br>
</p>
<p>I&#39;m now running with 20480 for=C2=A0memtable_total_space_in_mb and c=
assandra is using ~2GB for memtables.<br>
</p>
<p><br>
</p>
<p>Soo, off by 10 somewhere? Has anyone else seen this? Can&#39;t find a JI=
RA for any bug connected to this.<br>
</p>
<p>java=C2=A01.7.0_55, JNA 4.1.0 (for the 2.0 cluster)<br>
</p>
<p><br>
</p>
<p>BR<span><font color=3D"#888888"><br>
</font></span></p>
<span><font color=3D"#888888">
<p>Johan<br>
</p>
</font></span></div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div></div></div>
</div>
</div>

</blockquote></div><br></div>

--001a113441bc89776204fb00cc56--