Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of jwijgerd@gmail.com designates
 209.85.217.172 as permitted sender)
MIME-Version: 1.0
Date: Thu, 28 Jun 2012 12:09:16 +0200
Message-ID: 
 <CAHm08VTAoOmhH_e8PCi2KBiMEc1dPXDEka5q5M-Dhr3Qk17Qvw@mail.gmail.com>
Subject: Memtable tuning in 1.0 and higher
From: Joost van de Wijgerd <jwijgerd@gmail.com>
To: dev@cassandra.apache.org
Content-Type: multipart/alternative; boundary=bcaec554d84436799b04c3858605

--bcaec554d84436799b04c3858605
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I work for eBuddy, We've been using Cassandra in production since 0.6
(using 0.7 and 1.0, skipped 0.8) and use it for several Use Cases. One of
our uses is to persist our sessions.

Some background, in our case sessions are long lived, we have a mobile
messaging platform where sessions are essentially eternal. We use cassandra
as a system of record for our session so in case of scale out or fail over
we can quickly load the session state again. We use protocolbuffers to
serailize
our data into a byte buffer and then store this as a column value in a
(wide) row. We use a partition based approach to scale and each partition
has it's own
row in cassandra. Each session is mapped to a partition and stored in a
column in this row.

Every time there is a change in the session (i.e. message add, acked etc)
we schedule the session to be flushed to cassandra. Every x seconds we flush
the dirty sessions. So there are a serious number of (over)writes going on
and not that many reads (unless there is a failover situation or we scale
out). This
is using one of the strengths of cassandra.

In versions 0.6 and 0.7 it was possible to control the memtable settings on
a CF basis. So for this particular CF we would set the throughput really
high since there
are a huge number of overwrites. In the same cluster we have other CFs that
have a different load pattern.

Since we moved to version 1.0 however, it has become almost impossible to
tune our system for this (mixed) workload. Since we now have only two knobs
to turn (the size
of the commit log and the total memtable size) and you have introduced the
liveRation calculation. While this works ok for most workloads, our
persistent session store
is really hurt by the fact that the liveRatio cannot be lower than 1.0

We generally have an actual liveRatio of 0.025 on this CF due to the huge
number of overwrites. We are now artificially tuning up the total memtable
size but this interferes
with our other CFs who have a different workload. Due to this, our
performance has degraded quite a bit since on our 0.7 version we had our
session CF tuned so that
it would flush only once an hour, thus absorbing way more overwrites, thus
having to do less compactions and on a failover scenario most request could
be served straight
from the memtable (since we are doing since column reads there). Currently
we flush every 5 to 6 minutes under moderate load, so 10 times worse. This
is with the s
same heap setting etc.

Would you guys consider allowing lower values than 1.0 for the liveRatio
calculation? This would help us a lot. Perhaps make it a flag so it can be
turned on and off? Ideally
I would like the possibility back to tune on a CF by CF basis, this could
be a special setting that needs to be enabled for power users. The default
being what's there now.

Also, in the current version the live ration can never adjust downwards, I
see you guys have already made a fix for this in 1.1 but I have not seen it
on the 1.0 branch.

Let me know what you think

Kind regards,

Joost

-- 
Joost van de Wijgerd
joost.van.de.wijgerd@Skype
http://www.linkedin.com/in/jwijgerd

--bcaec554d84436799b04c3858605--