Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of dan.hendry.junk@gmail.com
 designates 209.85.212.44 as permitted sender)
From: "Dan Hendry" <dan.hendry.junk@gmail.com>
To: <user@cassandra.apache.org>
References: <4e4d6b8c.8949340a.72cd.1bc2@mx.google.com>
 <CALdd-zjpOZ_cgfSz32+eE385BoBUaBusis9rPREpPCeD+3H4LA@mail.gmail.com>
In-Reply-To: 
 <CALdd-zjpOZ_cgfSz32+eE385BoBUaBusis9rPREpPCeD+3H4LA@mail.gmail.com>
Subject: RE: Memtable flush thresholds - what am I missing?
Date: Thu, 18 Aug 2011 16:59:29 -0400
Message-ID: <4e4d7d55.8a63340a.4a20.74eb@mx.google.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Thread-Index: Acxd4DuMCFAgn2esRs2BFAD4uFL+OwABki9w
Content-Language: en-ca

Interesting.

Just to clarify, there are three main conditions which will trigger a =
flush
(based on data size):

1. The serialized size of a memtable exceeds the per CF =
memtable_throughput
setting.
2. For a single cf: (serialized size)*(live ratio)*(maximum possible
memtables in memory) > memtable_total_space_in_mb
3. sum_all_cf((serialized size)*(live ratio)) > =
memtable_total_space_in_mb

This makes a lot of sense to me, particularly in comparison to the 0.7 =
era
when the java overhead was not considered.=20

The fact that memtable_total_space_in_mb and memtable_throughput (in MB) =
are
actually referring to different megabytes (live vs serialized) is pretty
confusing and should really be made more explicit in the cli and/or
cassandra.yaml.

Dan

-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com]=20
Sent: August-18-11 15:51
To: user@cassandra.apache.org
Subject: Re: Memtable flush thresholds - what am I missing?

See http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/,
specifically the section on memtable_total_space_in_mb

On Thu, Aug 18, 2011 at 2:43 PM, Dan Hendry <dan.hendry.junk@gmail.com>
wrote:
> I am in the process of trying to tune the memtable flush thresholds =
for a
> particular column family (super column family to be specific) in my
> Cassandra 0.8.1 cluster. This CF is reasonably heavily used and =
getting
> flushed roughly every 5-8 minutes which is hardly optimal, =
particularly
> given I have JVM memory to spare at the moment. I am trying to =
understand
> the Cassandra logs but the numbers I am seeing are not making any =
sense.
>
>
>
> The initial memtable settings for this CF were throughput =3D 70 MB =
and
> operations =3D 0.7 =A0million. The flush messages I was seeing in the =
logs
> (after a =93flushing high-traffic column family=94 message for this =
CF) looked
> like:
>
> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 =93Enqueuing flush of =
Memtable-.... (17203504/600292480
> serialized/live bytes, 320432 ops)=94
>
>
>
> So... uh... ~17 MB serialized, ~600 MB live (whatever that means), and
~320k
> ops; the resulting sstables are ~34 MB. This is roughly what every =
flush
> looks like. Two minutes before this particular flush, GC triggering =
the
> StatusLogger shows ops and data for the CF as =93122592,230094268=94 =
or 122k
ops
> (sensible) and 230 MB (what???). For at least 2 minutes prior to THAT
> message, nothing else happened (flushes, compaction, etc) for any =
column
> family which means that this series of events (flush to gc log entry =
to
> flush) is reasonably isolated from any other activity.
>
>
>
> None of these numbers look even *remotely* close to 70 MB (the
> memtable_throughput setting). Anyway, via JMX I went in and changed
> throughput to 200 MB and operations to 0.5. This did *absolutely =
nothing*
to
> the flush behaviour: still ~17 MB serialized, ~600 MB live ~320k ops, =
~34
MB
> sstables, and flushes every 5-8 minutes (I waited for a few flushes in
case
> the change took some time to be applied). I also tried changing the
> operations threshold to 0.2 million which DID work so it=92s not a =
case of
the
> settings not being respected.
>
>
>
> WTF is going on? What is deciding that a flush is necessary and where =
are
> all of these crazy size discrepancies coming from? Some additional =
info
and
> things to point out:
>
> =B7=A0=A0=A0=A0=A0=A0=A0=A0 I am NOT seeing =93the heap is X full, =
Cassandra will now flush
the
> two largest memtables warnings=94 or any other errors/unexpected =
things
>
> =B7=A0=A0=A0=A0=A0=A0=A0=A0 The sum of memtable_throughput across all =
10 CFs is 770 MB, well
> less than the default global memtable threshold of ~4GB on a 12 GB =
java
> heap.
>
> =B7=A0=A0=A0=A0=A0=A0=A0=A0 There are no major compactions running on =
this machine and no
> repairs running across the cluster
>
> =B7=A0=A0=A0=A0=A0=A0=A0=A0 Hinted handoff is disabled
>
>
>
> Any insight would be appreciated.
>
>
>
> Dan Hendry


--=20
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com
No virus found in this incoming message.
Checked by AVG - www.avg.com=20
Version: 9.0.901 / Virus Database: 271.1.1/3842 - Release Date: 08/18/11
02:34:00