Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of Johan.Idren@dice.se designates
 207.46.163.244 as permitted sender)
From: =?iso-8859-1?Q?Idr=E9n=2C_Johan?= <Johan.Idren@dice.se>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: RE: memtable mem usage off by 10?
Thread-Topic: memtable mem usage off by 10?
Thread-Index: 
 AQHPf9nTM6QBZyGBek+e+cuGdDOK85tgtt+AgAAA2leAAAl4gIAAAMybgAAL8wCAAAeDY4AABggAgAADpe4=
Date: Wed, 4 Jun 2014 12:26:08 +0000
Message-ID: <1401884768496.56712@dice.se>
References: <1401875371141.25081@dice.se>
	<CAFqWSgeZDXCzkFitYQ82jxwcxFF0MKAqu3hyu5GoFX6M9TZEzQ@mail.gmail.com>
	<1401876241947.15837@dice.se>
	<CAFqWSgeSbwgLhrbzuCWfsLoXZ-VCQfJwLOwv33Bd=yrMOUe7fg@mail.gmail.com>
	<1401879841309.12539@dice.se>
	<CAFqWSgd10VJTZ9uhjGf604w8P+LiyHGhcDkOMPAtKM=_wAzdhw@mail.gmail.com>
	<1401882919755.3901@dice.se>,<CAFqWSgeJTnCbokNQnX518X_iJPYUDJyUDemSG9TXjL0JYQcqmA@mail.gmail.com>
In-Reply-To: 
 <CAFqWSgeJTnCbokNQnX518X_iJPYUDJyUDemSG9TXjL0JYQcqmA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
received-spf: None (: dice.se does not designate permitted sender hosts)
Content-Type: multipart/alternative;
	boundary="_000_140188476849656712dicese_"
MIME-Version: 1.0

--_000_140188476849656712dicese_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I was=
 going by the documentation. It claims that the property is around in 2.0.


If we skip that, part of my reply still makes sense:


Having memtable_total_size_in_mb set to 20480, memtables are flushed at a r=
eported value of ~2GB.


With a constant overhead of ~10x, as suggested, this would mean that it use=
d 20GB, which is 2x the size of the heap.


That shouldn't work. According to the OS, cassandra doesn't use more than ~=
11-12GB.


________________________________
From: Benedict Elliott Smith <belliottsmith@datastax.com>
Sent: Wednesday, June 4, 2014 2:07 PM
To: user@cassandra.apache.org
Subject: Re: memtable mem usage off by 10?

I'm confused: there is no flush_largest_memtables_at property in C* 2.0?


On 4 June 2014 12:55, Idr=E9n, Johan <Johan.Idren@dice.se<mailto:Johan.Idre=
n@dice.se>> wrote:

Ok, so the overhead is a constant modifier, right.


The 3x I arrived at with the following assumptions:


heap is 10GB

Default memory for memtable usage is 1/4 of heap in c* 2.0

max memory used for memtables is 2,5GB (10/4)

flush_largest_memtables_at is 0.75

flush largest memtables when memtables use 7,5GB (3/4 of heap, 3x of the de=
fault)


With an overhead of 10x, it makes sense that my memtable is flushed when th=
e jmx data says it is at ~250MB, ie 2,5GB, ie 1/4 of the heap


After I've set the memtable_total_size_in_mb to a value larger than 7,5GB, =
it should still not go over 7,5GB on account of flush_largest_memtables_at,=
 3/4 the heap


So I would expect to see memtables flushed to disk after they're being repo=
rtedly at around 750MB.


Having memtable_total_size_in_mb set to 20480, memtables are flushed at a r=
eported value of ~2GB.


With a constant overhead, this would mean that it used 20GB, which is 2x th=
e size of the heap, instead of 3/4 of the heap as it should be if flush_lar=
gest_memtables_at was being respected.


This shouldn't be possible.


________________________________
From: Benedict Elliott Smith <belliottsmith@datastax.com<mailto:belliottsmi=
th@datastax.com>>
Sent: Wednesday, June 4, 2014 1:19 PM

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: memtable mem usage off by 10?

Unfortunately it looks like the heap utilisation of memtables was not expos=
ed in earlier versions, because they only maintained an estimate.

The overhead scales linearly with the amount of data in your memtables (ass=
uming the size of each cell is approx. constant).

flush_largest_memtables_at is an independent setting to memtable_total_spac=
e_in_mb, and generally has little effect. Ordinarily sstable flushes are tr=
iggered by hitting the memtable_total_space_in_mb limit. I'm afraid I don't=
 follow where your 3x comes from?


On 4 June 2014 12:04, Idr=E9n, Johan <Johan.Idren@dice.se<mailto:Johan.Idre=
n@dice.se>> wrote:

Aha, ok. Thanks.


Trying to understand what my cluster is doing:


cassandra.db.memtable_data_size only gets me the actual data but not the me=
mtable heap memory usage. Is there a way to check for heap memory usage?


I would expect to hit the flush_largest_memtables_at value, and this would =
be what causes the memtable flush to sstable then? By default 0.75?


Then I would expect the amount of memory to be used to be maximum ~3x of wh=
at I was seeing when I hadn't set memtable_total_space_in_mb (1/4 by defaul=
t, max 3/4 before a flush), instead of close to 10x (250mb vs 2gb).

This is of course assuming that the overhead scales linearly with the amoun=
t of data in my table, we're using one table with three cells in this case.=
 If it hardly increases at all, then I'll give up I guess :)

At least until 2.1.0 comes out and I can compare.


BR

Johan


________________________________
From: Benedict Elliott Smith <belliottsmith@datastax.com<mailto:belliottsmi=
th@datastax.com>>
Sent: Wednesday, June 4, 2014 12:33 PM

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: memtable mem usage off by 10?

These measurements tell you the amount of user data stored in the memtables=
, not the amount of heap used to store it, so the same applies.


On 4 June 2014 11:04, Idr=E9n, Johan <Johan.Idren@dice.se<mailto:Johan.Idre=
n@dice.se>> wrote:

I'm not measuring memtable size by looking at the sstables on disk, no. I'm=
 looking through the JMX data. So I would believe (or hope) that I'm gettin=
g relevant data.


If I have a heap of 10GB and set the memtable usage to 20GB, I would expect=
 to hit other problems, but I'm not seeing memory usage over 10GB for the h=
eap, and the machine (which has ~30gb of memory) is showing ~10GB free, wit=
h ~12GB used by cassandra, the rest in caches.


Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it's not idlin=
g.


BR

Johan


________________________________
From: Benedict Elliott Smith <belliottsmith@datastax.com<mailto:belliottsmi=
th@datastax.com>>
Sent: Wednesday, June 4, 2014 11:56 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: memtable mem usage off by 10?

If you are storing small values in your columns, the object overhead is ver=
y substantial. So what is 400Mb on disk may well be 4Gb in memtables, so if=
 you are measuring the memtable size by the resulting sstable size, you are=
 not getting an accurate picture. This overhead has been reduced by about 9=
0% in the upcoming 2.1 release, through tickets 6271<https://issues.apache.=
org/jira/browse/CASSANDRA-6271>, 6689<https://issues.apache.org/jira/browse=
/CASSANDRA-6689> and 6694<https://issues.apache.org/jira/browse/CASSANDRA-6=
694>.


On 4 June 2014 10:49, Idr=E9n, Johan <Johan.Idren@dice.se<mailto:Johan.Idre=
n@dice.se>> wrote:

Hi,


I'm seeing some strange behavior of the memtables, both in 1.2.13 and 2.0.7=
, basically it looks like it's using 10x less memory than it should based o=
n the documentation and options.


10GB heap for both clusters.

1.2.x should use 1/3 of the heap for memtables, but it uses max ~300mb befo=
re flushing


2.0.7, same but 1/4 and ~250mb


In the 2.0.7 cluster I set the memtable_total_space_in_mb to 4096, which th=
en allowed cassandra to use up to ~400mb for memtables...


I'm now running with 20480 for memtable_total_space_in_mb and cassandra is =
using ~2GB for memtables.


Soo, off by 10 somewhere? Has anyone else seen this? Can't find a JIRA for =
any bug connected to this.

java 1.7.0_55, JNA 4.1.0 (for the 2.0 cluster)


BR

Johan


--_000_140188476849656712dicese_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=
1">
<style type=3D"text/css" style=3D"display:none"><!-- p { margin-top: 0px; m=
argin-bottom: 0px; }--></style>
</head>
<body dir=3D"ltr">
<div id=3D"OWAFontStyleDivID" style=3D"font-size:12pt;color:#000000;backgro=
und-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;">
<p>Oh, well ok that explains why I'm not seeing a flush at 750MB. Sorry, I =
was going by the documentation. It claims that the property is around in 2.=
0.<br>
</p>
<p><br>
</p>
<p>If we skip that, part of my reply still makes sense:<br>
</p>
<p><br>
</p>
<p style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size: =
16px;">Having memtable_total_size_in_mb set to 20480, memtables are flushed=
 at a reported value of ~2GB.&nbsp;<br>
</p>
<p style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size: =
16px;"><br>
</p>
<p style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size: =
16px;">With a constant overhead of ~10x, as suggested, this would mean that=
 it used 20GB, which is 2x the size of the heap.<br>
</p>
<p style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size: =
16px;"><br>
</p>
<p style=3D"font-family: Calibri, Arial, Helvetica, sans-serif; font-size: =
16px;">That shouldn't work. According to the OS, cassandra doesn't use more=
 than ~11-12GB.<br>
</p>
<p><br>
</p>
<div style=3D"color: rgb(40, 40, 40);">
<hr tabindex=3D"-1" style=3D"display:inline-block; width:98%">
<div id=3D"divRplyFwdMsg" dir=3D"ltr"><font face=3D"Calibri, sans-serif" co=
lor=3D"#000000" style=3D"font-size:11pt"><b>From:</b> Benedict Elliott Smit=
h &lt;belliottsmith@datastax.com&gt;<br>
<b>Sent:</b> Wednesday, June 4, 2014 2:07 PM<br>
<b>To:</b> user@cassandra.apache.org<br>
<b>Subject:</b> Re: memtable mem usage off by 10?</font>
<div>&nbsp;</div>
</div>
<div>
<div dir=3D"ltr">I'm confused: there is no flush_largest_memtables_at prope=
rty in C* 2.0?</div>
<div class=3D"gmail_extra"><br>
<br>
<div class=3D"gmail_quote">On 4 June 2014 12:55, Idr=E9n, Johan <span dir=
=3D"ltr">&lt;<a href=3D"mailto:Johan.Idren@dice.se" target=3D"_blank">Johan=
.Idren@dice.se</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div dir=3D"ltr">
<div style=3D"font-size:12pt; color:#000000; background-color:#ffffff; font=
-family:Calibri,Arial,Helvetica,sans-serif">
<p>Ok, so the overhead is a constant modifier, right.<br>
</p>
<p><br>
</p>
<p>The 3x I arrived at with the following assumptions:<br>
</p>
<p><br>
</p>
<p><span style=3D"font-size:12pt">h</span><span style=3D"font-size:12pt">ea=
p is 10GB</span><br>
</p>
<p style=3D"font-family:Calibri,Arial,Helvetica,sans-serif; font-size:16px"=
>Default memory for memtable usage is 1/4 of heap in c* 2.0<br>
</p>
<div><span style=3D"font-size:12pt">max memory used for memtables is 2,5GB =
(10/4)</span></div>
<p>flush_largest_memtables_at is 0.75<br>
</p>
<p>flush largest memtables when memtables use 7,5GB (3/4&nbsp;of heap, 3x o=
f the default)<br>
</p>
<p><br>
</p>
<p>With an overhead of 10x, it makes sense that my memtable is flushed when=
 the jmx data says it is at ~250MB, ie 2,5GB, ie 1/4 of the heap<br>
</p>
<p><br>
</p>
<p>After I've set the memtable_total_size_in_mb to a value larger than 7,5G=
B, it should still not go over 7,5GB on account of flush_largest_memtables_=
at, 3/4 the heap<br>
</p>
<p><br>
</p>
<p>So I would expect to see memtables flushed to disk after they're being r=
eportedly at around 750MB.<br>
</p>
<p><br>
</p>
<p>Having memtable_total_size_in_mb set to 20480, memtables are flushed at =
a reported value of ~2GB.&nbsp;<br>
</p>
<p><br>
</p>
<p>With a constant overhead, this would mean that it used 20GB, which is 2x=
 the size of the heap, instead of 3/4 of the heap as it should be if flush_=
largest_memtables_at was being respected.<br>
</p>
<p><br>
</p>
<p>This shouldn't be possible.<br>
</p>
<p><br>
</p>
<div style=3D"color:rgb(40,40,40)">
<hr style=3D"display:inline-block; width:98%">
<div dir=3D"ltr"><font face=3D"Calibri, sans-serif" color=3D"#000000" style=
=3D"font-size:11pt">
<div class=3D""><b>From:</b> Benedict Elliott Smith &lt;<a href=3D"mailto:b=
elliottsmith@datastax.com" target=3D"_blank">belliottsmith@datastax.com</a>=
&gt;<br>
</div>
<b>Sent:</b> Wednesday, June 4, 2014 1:19 PM
<div>
<div class=3D"h5"><br>
<b>To:</b> <a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">u=
ser@cassandra.apache.org</a><br>
<b>Subject:</b> Re: memtable mem usage off by 10?</div>
</div>
</font>
<div>&nbsp;</div>
</div>
<div>
<div class=3D"h5">
<div>
<div dir=3D"ltr">Unfortunately it looks like the heap utilisation of memtab=
les was not exposed in earlier versions, because they only maintained an es=
timate.
<div><br>
</div>
<div>The overhead scales linearly with the amount of data in your memtables=
 (assuming the size of each cell is approx. constant).&nbsp;</div>
<div><br>
</div>
<div>flush_largest_memtables_at is an independent setting to memtable_total=
_space_in_mb, and generally has little effect. Ordinarily sstable flushes a=
re triggered by hitting the memtable_total_space_in_mb limit. I'm afraid I =
don't follow where your 3x comes
 from?<br>
</div>
</div>
<div class=3D"gmail_extra"><br>
<br>
<div class=3D"gmail_quote">On 4 June 2014 12:04, Idr=E9n, Johan <span dir=
=3D"ltr">&lt;<a href=3D"mailto:Johan.Idren@dice.se" target=3D"_blank">Johan=
.Idren@dice.se</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div dir=3D"ltr">
<div style=3D"font-size:12pt; color:#000000; background-color:#ffffff; font=
-family:Calibri,Arial,Helvetica,sans-serif">
<p>Aha, ok. Thanks.<br>
</p>
<p><br>
</p>
<p>Trying to understand what my cluster is doing:<br>
</p>
<p><br>
</p>
<p><span style=3D"font-size:12pt">cassandra.db.memtable_data_size</span><sp=
an style=3D"font-size:12pt">&nbsp;only gets me the actual data but not the =
memtable heap memory usage. Is there a way to check for heap memory usage?<=
/span><br>
</p>
<p><span style=3D"font-size:12pt"><br>
</span></p>
<p><span style=3D"font-size:12pt">I would expect to hit the flush_largest_m=
emtables_at value, and this would be what causes the memtable&nbsp;flush to=
 sstable&nbsp;then? By default 0.75?</span></p>
<p><span style=3D"font-size:12pt"><br>
</span></p>
<p><span style=3D"font-size:12pt">Then I would expect the amount of memory =
to be used to be maximum&nbsp;~3x of what I was seeing when I hadn't set me=
mtable_total_space_in_mb&nbsp;(1/4 by default, max&nbsp;3/4 before a flush)=
, instead of close to 10x (250mb vs 2gb).</span></p>
<p><span style=3D"font-size:12pt"><br>
This is of course assuming that the overhead scales linearly with the amoun=
t of data in my table, we're using one table with three cells in this case.=
 If it hardly increases at all, then I'll give up I guess :)</span></p>
<p><span style=3D"font-size:12pt">At least until 2.1.0 comes out and I can =
compare.</span></p>
<p><span style=3D"font-size:12pt"><br>
</span></p>
<p><span style=3D"font-size:12pt">BR</span></p>
<p><span style=3D"font-size:12pt">Johan</span></p>
<p><br>
</p>
<div style=3D"color:rgb(40,40,40)">
<hr style=3D"display:inline-block; width:98%">
<div dir=3D"ltr"><font face=3D"Calibri, sans-serif" color=3D"#000000" style=
=3D"font-size:11pt">
<div><b>From:</b> Benedict Elliott Smith &lt;<a href=3D"mailto:belliottsmit=
h@datastax.com" target=3D"_blank">belliottsmith@datastax.com</a>&gt;<br>
</div>
<b>Sent:</b> Wednesday, June 4, 2014 12:33 PM
<div>
<div><br>
<b>To:</b> <a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">u=
ser@cassandra.apache.org</a><br>
<b>Subject:</b> Re: memtable mem usage off by 10?</div>
</div>
</font>
<div>&nbsp;</div>
</div>
<div>
<div>
<div>
<div dir=3D"ltr">These measurements tell you the amount of user data stored=
 in the memtables, not the amount of heap used to store it, so the same app=
lies.</div>
<div class=3D"gmail_extra"><br>
<br>
<div class=3D"gmail_quote">On 4 June 2014 11:04, Idr=E9n, Johan <span dir=
=3D"ltr">&lt;<a href=3D"mailto:Johan.Idren@dice.se" target=3D"_blank">Johan=
.Idren@dice.se</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div dir=3D"ltr">
<div style=3D"font-size:12pt; color:#000000; background-color:#ffffff; font=
-family:Calibri,Arial,Helvetica,sans-serif">
<p>I'm not measuring memtable size by looking at the sstables on disk, no. =
I'm looking through the JMX data. So I would believe (or hope) that I'm get=
ting relevant data.&nbsp;<br>
</p>
<p><br>
</p>
<p>If I have a heap of 10GB and set the memtable usage to 20GB, I would exp=
ect to hit other problems, but I'm not seeing memory usage over 10GB for th=
e heap, and the machine (which has ~30gb of memory) is showing ~10GB free, =
with ~12GB used by cassandra, the
 rest in caches.<br>
</p>
<p><br>
</p>
<p>Reading 8k rows/s, writing 2k rows/s on a 3 node cluster. So it's not id=
ling.<br>
</p>
<p><br>
</p>
<p>BR<br>
</p>
<p>Johan<br>
</p>
<p><br>
</p>
<div style=3D"color:rgb(40,40,40)">
<hr style=3D"display:inline-block; width:98%">
<div dir=3D"ltr"><font face=3D"Calibri, sans-serif" color=3D"#000000" style=
=3D"font-size:11pt"><b>From:</b> Benedict Elliott Smith &lt;<a href=3D"mail=
to:belliottsmith@datastax.com" target=3D"_blank">belliottsmith@datastax.com=
</a>&gt;<br>
<b>Sent:</b> Wednesday, June 4, 2014 11:56 AM<br>
<b>To:</b> <a href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">u=
ser@cassandra.apache.org</a><br>
<b>Subject:</b> Re: memtable mem usage off by 10?</font>
<div>&nbsp;</div>
</div>
<div>
<div>
<div>
<div dir=3D"ltr">If you are storing small values in your columns, the objec=
t overhead is very substantial. So what is 400Mb on disk may well be 4Gb in=
 memtables, so if you are measuring the memtable size by the resulting ssta=
ble size, you are not getting an accurate
 picture. This overhead has been reduced by about 90% in the upcoming 2.1 r=
elease, through tickets
<a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-6271" target=3D"=
_blank">6271</a>,
<a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-6689" target=3D"=
_blank">6689</a>&nbsp;and&nbsp;<a href=3D"https://issues.apache.org/jira/br=
owse/CASSANDRA-6694" target=3D"_blank">6694</a>.</div>
<div class=3D"gmail_extra"><br>
<br>
<div class=3D"gmail_quote">On 4 June 2014 10:49, Idr=E9n, Johan <span dir=
=3D"ltr">&lt;<a href=3D"mailto:Johan.Idren@dice.se" target=3D"_blank">Johan=
.Idren@dice.se</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex; border-left:1=
px #ccc solid; padding-left:1ex">
<div dir=3D"ltr">
<div style=3D"font-size:12pt; color:#000000; background-color:#ffffff; font=
-family:Calibri,Arial,Helvetica,sans-serif">
<p>Hi,<br>
</p>
<p><br>
</p>
<p>I'm seeing some strange behavior of the memtables, both in 1.2.13 and 2.=
0.7, basically it looks like it's using 10x less memory than it should base=
d on the documentation and options.<br>
</p>
<p><br>
</p>
<p>10GB heap for both clusters.<br>
</p>
<p>1.2.x should use 1/3 of the heap for memtables, but it uses max ~300mb b=
efore flushing<br>
</p>
<p><br>
</p>
<p>2.0.7, same but 1/4 and ~250mb<br>
</p>
<p><br>
</p>
<p>In the 2.0.7 cluster I set the&nbsp;memtable_total_space_in_mb to 4096, =
which then allowed cassandra to use up to ~400mb for memtables...<br>
</p>
<p><br>
</p>
<p>I'm now running with 20480 for&nbsp;memtable_total_space_in_mb and cassa=
ndra is using ~2GB for memtables.<br>
</p>
<p><br>
</p>
<p>Soo, off by 10 somewhere? Has anyone else seen this? Can't find a JIRA f=
or any bug connected to this.<br>
</p>
<p>java&nbsp;1.7.0_55, JNA 4.1.0 (for the 2.0 cluster)<br>
</p>
<p><br>
</p>
<p>BR<span><font color=3D"#888888"><br>
</font></span></p>
<span><font color=3D"#888888">
<p>Johan<br>
</p>
</font></span></div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</body>
</html>

--_000_140188476849656712dicese_--