Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of dan.hendry.junk@gmail.com
 designates 209.85.220.172 as permitted sender)
From: "Dan Hendry" <dan.hendry.junk@gmail.com>
To: <user@cassandra.apache.org>
Subject: Memtable flush thresholds - what am I missing?
Date: Thu, 18 Aug 2011 15:43:37 -0400
Message-ID: <4e4d6b8c.8949340a.72cd.1bc2@mx.google.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_0080_01CC5DBD.991516C0"
Thread-Index: Acxd3x8LhY5pD1boQCCqJjg4QgJBLg==
Content-Language: en-ca

This is a multi-part message in MIME format.

------=_NextPart_000_0080_01CC5DBD.991516C0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

I am in the process of trying to tune the memtable flush thresholds for a
particular column family (super column family to be specific) in my
Cassandra 0.8.1 cluster. This CF is reasonably heavily used and getting
flushed roughly every 5-8 minutes which is hardly optimal, particularly
given I have JVM memory to spare at the moment. I am trying to understand
the Cassandra logs but the numbers I am seeing are not making any sense.

 
The initial memtable settings for this CF were throughput = 70 MB and
operations = 0.7  million. The flush messages I was seeing in the logs
(after a "flushing high-traffic column family" message for this CF) looked
like:

                "Enqueuing flush of Memtable-.... (17203504/600292480
serialized/live bytes, 320432 ops)"

 
So... uh... ~17 MB serialized, ~600 MB live (whatever that means), and ~320k
ops; the resulting sstables are ~34 MB. This is roughly what every flush
looks like. Two minutes before this particular flush, GC triggering the
StatusLogger shows ops and data for the CF as "122592,230094268" or 122k ops
(sensible) and 230 MB (what???). For at least 2 minutes prior to THAT
message, nothing else happened (flushes, compaction, etc) for any column
family which means that this series of events (flush to gc log entry to
flush) is reasonably isolated from any other activity. 

 
None of these numbers look even *remotely* close to 70 MB (the
memtable_throughput setting). Anyway, via JMX I went in and changed
throughput to 200 MB and operations to 0.5. This did *absolutely nothing* to
the flush behaviour: still ~17 MB serialized, ~600 MB live ~320k ops, ~34 MB
sstables, and flushes every 5-8 minutes (I waited for a few flushes in case
the change took some time to be applied). I also tried changing the
operations threshold to 0.2 million which DID work so it's not a case of the
settings not being respected.

 
WTF is going on? What is deciding that a flush is necessary and where are
all of these crazy size discrepancies coming from? Some additional info and
things to point out:

.         I am NOT seeing "the heap is X full, Cassandra will now flush the
two largest memtables warnings" or any other errors/unexpected things

.         The sum of memtable_throughput across all 10 CFs is 770 MB, well
less than the default global memtable threshold of ~4GB on a 12 GB java
heap. 

.         There are no major compactions running on this machine and no
repairs running across the cluster

.         Hinted handoff is disabled

 
Any insight would be appreciated.

 
Dan Hendry


------=_NextPart_000_0080_01CC5DBD.991516C0
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
xmlns=3D"http://www.w3.org/TR/REC-html40"><head><meta =
http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dus-ascii"><meta name=3DGenerator content=3D"Microsoft Word 12 =
(filtered medium)"><style><!--
/* Font Definitions */
@font-face
	{font-family:Wingdings;
	panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0cm;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
	{mso-style-priority:34;
	margin-top:0cm;
	margin-right:0cm;
	margin-bottom:0cm;
	margin-left:36.0pt;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri","sans-serif";}
span.EmailStyle17
	{mso-style-type:personal-compose;
	font-family:"Calibri","sans-serif";
	color:windowtext;}
.MsoChpDefault
	{mso-style-type:export-only;}
@page WordSection1
	{size:612.0pt 792.0pt;
	margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
	{page:WordSection1;}
/* List Definitions */
@list l0
	{mso-list-id:504637359;
	mso-list-type:hybrid;
	mso-list-template-ids:1588894430 269025295 269025305 269025307 =
269025295 269025305 269025307 269025295 269025305 269025307;}
@list l0:level1
	{mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-18.0pt;}
@list l1
	{mso-list-id:1438066085;
	mso-list-type:hybrid;
	mso-list-template-ids:-824170876 269025281 269025283 269025285 =
269025281 269025283 269025285 269025281 269025283 269025285;}
@list l1:level1
	{mso-level-number-format:bullet;
	mso-level-text:\F0B7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-18.0pt;
	font-family:Symbol;}
ol
	{margin-bottom:0cm;}
ul
	{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]--></head><body lang=3DEN-CA link=3Dblue =
vlink=3Dpurple><div class=3DWordSection1><p class=3DMsoNormal>I am in =
the process of trying to tune the memtable flush thresholds for a =
particular column family (super column family to be specific) in my =
Cassandra 0.8.1 cluster. This CF is reasonably heavily used and getting =
flushed roughly every 5-8 minutes which is hardly optimal, particularly =
given I have JVM memory to spare at the moment. I am trying to =
understand the Cassandra logs but the numbers I am seeing are not making =
any sense.<o:p></o:p></p><p class=3DMsoNormal><o:p>&nbsp;</o:p></p><p =
class=3DMsoNormal>The initial memtable settings for this CF were =
throughput =3D 70 MB and operations =3D 0.7 &nbsp;million. The flush =
messages I was seeing in the logs (after a &#8220;flushing high-traffic =
column family&#8221; message for this CF) looked like:<o:p></o:p></p><p =
class=3DMsoNormal>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &#8220;Enqueuing flush of =
Memtable-.... (17203504/600292480 serialized/live bytes, 320432 =
ops)&#8221;<o:p></o:p></p><p class=3DMsoNormal><o:p>&nbsp;</o:p></p><p =
class=3DMsoNormal>So... uh... ~17 MB serialized, ~600 MB live (whatever =
that means), and ~320k ops; the resulting sstables are ~34 MB. This is =
roughly what every flush looks like. Two minutes before this particular =
flush, GC triggering the StatusLogger shows ops and data for the CF as =
&#8220;122592,230094268&#8221; or 122k ops (sensible) and 230 MB =
(what???). For at least 2 minutes prior to THAT message, nothing else =
happened (flushes, compaction, etc) for any column family which means =
that this series of events (flush to gc log entry to flush) is =
reasonably isolated from any other activity. <o:p></o:p></p><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>None of =
these numbers look even *<b>remotely</b>* close to 70 MB (the =
memtable_throughput setting). Anyway, via JMX I went in and changed =
throughput to 200 MB and operations to 0.5. This did *<b>absolutely =
nothing</b>* to the flush behaviour: still ~17 MB serialized, ~600 MB =
live ~320k ops, ~34 MB sstables, and flushes every 5-8 minutes (I waited =
for a few flushes in case the change took some time to be applied). I =
also tried changing the operations threshold to 0.2 million which DID =
work so it&#8217;s not a case of the settings not being =
respected.<o:p></o:p></p><p class=3DMsoNormal><o:p>&nbsp;</o:p></p><p =
class=3DMsoNormal>WTF is going on? What is deciding that a flush is =
necessary and where are all of these crazy size discrepancies coming =
from? Some additional info and things to point out:<o:p></o:p></p><p =
class=3DMsoListParagraph style=3D'text-indent:-18.0pt;mso-list:l1 level1 =
lfo2'><![if !supportLists]><span style=3D'font-family:Symbol'><span =
style=3D'mso-list:Ignore'>&middot;<span style=3D'font:7.0pt "Times New =
Roman"'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
</span></span></span><![endif]>I am NOT seeing &#8220;the heap is X =
full, Cassandra will now flush the two largest memtables warnings&#8221; =
or any other errors/unexpected things<o:p></o:p></p><p =
class=3DMsoListParagraph style=3D'text-indent:-18.0pt;mso-list:l1 level1 =
lfo2'><![if !supportLists]><span style=3D'font-family:Symbol'><span =
style=3D'mso-list:Ignore'>&middot;<span style=3D'font:7.0pt "Times New =
Roman"'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
</span></span></span><![endif]>The sum of memtable_throughput across all =
10 CFs is 770 MB, well less than the default global memtable threshold =
of ~4GB on a 12 GB java heap. <o:p></o:p></p><p class=3DMsoListParagraph =
style=3D'text-indent:-18.0pt;mso-list:l1 level1 lfo2'><![if =
!supportLists]><span style=3D'font-family:Symbol'><span =
style=3D'mso-list:Ignore'>&middot;<span style=3D'font:7.0pt "Times New =
Roman"'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
</span></span></span><![endif]>There are no major compactions running on =
this machine and no repairs running across the cluster<o:p></o:p></p><p =
class=3DMsoListParagraph style=3D'text-indent:-18.0pt;mso-list:l1 level1 =
lfo2'><![if !supportLists]><span style=3D'font-family:Symbol'><span =
style=3D'mso-list:Ignore'>&middot;<span style=3D'font:7.0pt "Times New =
Roman"'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
</span></span></span><![endif]>Hinted handoff is =
disabled<o:p></o:p></p><p class=3DMsoNormal><o:p>&nbsp;</o:p></p><p =
class=3DMsoNormal>Any insight would be appreciated.<o:p></o:p></p><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>Dan =
Hendry<o:p></o:p></p></div></body></html>
------=_NextPart_000_0080_01CC5DBD.991516C0--