Return-Path: X-Original-To: apmail-cassandra-dev-archive@www.apache.org Delivered-To: apmail-cassandra-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 624E79296 for ; Thu, 28 Jun 2012 10:09:48 +0000 (UTC) Received: (qmail 33556 invoked by uid 500); 28 Jun 2012 10:09:46 -0000 Delivered-To: apmail-cassandra-dev-archive@cassandra.apache.org Received: (qmail 33015 invoked by uid 500); 28 Jun 2012 10:09:44 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 32978 invoked by uid 99); 28 Jun 2012 10:09:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jun 2012 10:09:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jwijgerd@gmail.com designates 209.85.217.172 as permitted sender) Received: from [209.85.217.172] (HELO mail-lb0-f172.google.com) (209.85.217.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jun 2012 10:09:38 +0000 Received: by lbbgo11 with SMTP id go11so3119615lbb.31 for ; Thu, 28 Jun 2012 03:09:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=rjoBPjTBv7YfrodijVmAzv/C7kl+Rj5b8/STb7CKAAI=; b=BxeDXoHCwOluImODBnBZN4ZZdppsnJrRpKbe0RkebPVp3Qpd783cX68hSNfVlcyAc7 Xyoa32uKfPK1HtbSLs4RdegisYN4Ik/YUskQxIjsziTbYHQGc8dP4l4oq9bixx2gEQPW 4IAo22fXFle4T2m/naARWfhfslQ3lXw6iLRw/cIAB+iQE1HYDOs1vD0fpJl2Gr7rLmL3 uXk1bY1eYfFCDitO8cLxtKHqhtuaVcaVR0L5TwTj0jmo7BAlbQvkvAMB5spJGyS6n4jQ X2nshmjTpVKMSsEaxuxsBLKjbEcUqntM6R/svqTuM5IpCZP4h2f1S7Ip+Ry8A4GbcvPf IoVQ== MIME-Version: 1.0 Received: by 10.112.28.137 with SMTP id b9mr759351lbh.99.1340878156626; Thu, 28 Jun 2012 03:09:16 -0700 (PDT) Received: by 10.112.89.113 with HTTP; Thu, 28 Jun 2012 03:09:16 -0700 (PDT) Date: Thu, 28 Jun 2012 12:09:16 +0200 Message-ID: Subject: Memtable tuning in 1.0 and higher From: Joost van de Wijgerd To: dev@cassandra.apache.org Content-Type: multipart/alternative; boundary=bcaec554d84436799b04c3858605 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec554d84436799b04c3858605 Content-Type: text/plain; charset=ISO-8859-1 Hi, I work for eBuddy, We've been using Cassandra in production since 0.6 (using 0.7 and 1.0, skipped 0.8) and use it for several Use Cases. One of our uses is to persist our sessions. Some background, in our case sessions are long lived, we have a mobile messaging platform where sessions are essentially eternal. We use cassandra as a system of record for our session so in case of scale out or fail over we can quickly load the session state again. We use protocolbuffers to serailize our data into a byte buffer and then store this as a column value in a (wide) row. We use a partition based approach to scale and each partition has it's own row in cassandra. Each session is mapped to a partition and stored in a column in this row. Every time there is a change in the session (i.e. message add, acked etc) we schedule the session to be flushed to cassandra. Every x seconds we flush the dirty sessions. So there are a serious number of (over)writes going on and not that many reads (unless there is a failover situation or we scale out). This is using one of the strengths of cassandra. In versions 0.6 and 0.7 it was possible to control the memtable settings on a CF basis. So for this particular CF we would set the throughput really high since there are a huge number of overwrites. In the same cluster we have other CFs that have a different load pattern. Since we moved to version 1.0 however, it has become almost impossible to tune our system for this (mixed) workload. Since we now have only two knobs to turn (the size of the commit log and the total memtable size) and you have introduced the liveRation calculation. While this works ok for most workloads, our persistent session store is really hurt by the fact that the liveRatio cannot be lower than 1.0 We generally have an actual liveRatio of 0.025 on this CF due to the huge number of overwrites. We are now artificially tuning up the total memtable size but this interferes with our other CFs who have a different workload. Due to this, our performance has degraded quite a bit since on our 0.7 version we had our session CF tuned so that it would flush only once an hour, thus absorbing way more overwrites, thus having to do less compactions and on a failover scenario most request could be served straight from the memtable (since we are doing since column reads there). Currently we flush every 5 to 6 minutes under moderate load, so 10 times worse. This is with the s same heap setting etc. Would you guys consider allowing lower values than 1.0 for the liveRatio calculation? This would help us a lot. Perhaps make it a flag so it can be turned on and off? Ideally I would like the possibility back to tune on a CF by CF basis, this could be a special setting that needs to be enabled for power users. The default being what's there now. Also, in the current version the live ration can never adjust downwards, I see you guys have already made a fix for this in 1.1 but I have not seen it on the 1.0 branch. Let me know what you think Kind regards, Joost -- Joost van de Wijgerd joost.van.de.wijgerd@Skype http://www.linkedin.com/in/jwijgerd --bcaec554d84436799b04c3858605--