Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B88BB8F8B for ; Thu, 18 Aug 2011 19:44:45 +0000 (UTC) Received: (qmail 38249 invoked by uid 500); 18 Aug 2011 19:44:43 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 38217 invoked by uid 500); 18 Aug 2011 19:44:42 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 38209 invoked by uid 99); 18 Aug 2011 19:44:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2011 19:44:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dan.hendry.junk@gmail.com designates 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vx0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2011 19:44:34 +0000 Received: by vxi29 with SMTP id 29so2481345vxi.31 for ; Thu, 18 Aug 2011 12:44:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:to:subject:date:message-id:mime-version:content-type:x-mailer :thread-index:content-language; bh=mUQjnfufizgKskYGzwSSDp3DrRPm6ep+UquAG70O2rA=; b=mpACF7B2uLaUXKzKDLbkkfGjfxHMpeMda2/mDvfjxMDGyY6lgxzpJQ5qAzIfjVCFQQ AP1W76VHgqj5tZR4TMiVpzKNt4hsg79kdzrupmu4SbibbR2NWw0CM/NEFOFsZ6ulLc4T XKhIS02gLW6onQxBGq4waJGDdQH1S9y08V3ew= Received: by 10.52.92.1 with SMTP id ci1mr1218263vdb.177.1313696653531; Thu, 18 Aug 2011 12:44:13 -0700 (PDT) Received: from DHTABLET ([216.16.242.198]) by mx.google.com with ESMTPS id l9sm352105vdv.2.2011.08.18.12.44.12 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 18 Aug 2011 12:44:12 -0700 (PDT) From: "Dan Hendry" To: Subject: Memtable flush thresholds - what am I missing? Date: Thu, 18 Aug 2011 15:43:37 -0400 Message-ID: <4e4d6b8c.8949340a.72cd.1bc2@mx.google.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0080_01CC5DBD.991516C0" X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Acxd3x8LhY5pD1boQCCqJjg4QgJBLg== Content-Language: en-ca This is a multi-part message in MIME format. ------=_NextPart_000_0080_01CC5DBD.991516C0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit I am in the process of trying to tune the memtable flush thresholds for a particular column family (super column family to be specific) in my Cassandra 0.8.1 cluster. This CF is reasonably heavily used and getting flushed roughly every 5-8 minutes which is hardly optimal, particularly given I have JVM memory to spare at the moment. I am trying to understand the Cassandra logs but the numbers I am seeing are not making any sense. The initial memtable settings for this CF were throughput = 70 MB and operations = 0.7 million. The flush messages I was seeing in the logs (after a "flushing high-traffic column family" message for this CF) looked like: "Enqueuing flush of Memtable-.... (17203504/600292480 serialized/live bytes, 320432 ops)" So... uh... ~17 MB serialized, ~600 MB live (whatever that means), and ~320k ops; the resulting sstables are ~34 MB. This is roughly what every flush looks like. Two minutes before this particular flush, GC triggering the StatusLogger shows ops and data for the CF as "122592,230094268" or 122k ops (sensible) and 230 MB (what???). For at least 2 minutes prior to THAT message, nothing else happened (flushes, compaction, etc) for any column family which means that this series of events (flush to gc log entry to flush) is reasonably isolated from any other activity. None of these numbers look even *remotely* close to 70 MB (the memtable_throughput setting). Anyway, via JMX I went in and changed throughput to 200 MB and operations to 0.5. This did *absolutely nothing* to the flush behaviour: still ~17 MB serialized, ~600 MB live ~320k ops, ~34 MB sstables, and flushes every 5-8 minutes (I waited for a few flushes in case the change took some time to be applied). I also tried changing the operations threshold to 0.2 million which DID work so it's not a case of the settings not being respected. WTF is going on? What is deciding that a flush is necessary and where are all of these crazy size discrepancies coming from? Some additional info and things to point out: . I am NOT seeing "the heap is X full, Cassandra will now flush the two largest memtables warnings" or any other errors/unexpected things . The sum of memtable_throughput across all 10 CFs is 770 MB, well less than the default global memtable threshold of ~4GB on a 12 GB java heap. . There are no major compactions running on this machine and no repairs running across the cluster . Hinted handoff is disabled Any insight would be appreciated. Dan Hendry ------=_NextPart_000_0080_01CC5DBD.991516C0 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

I am in = the process of trying to tune the memtable flush thresholds for a = particular column family (super column family to be specific) in my = Cassandra 0.8.1 cluster. This CF is reasonably heavily used and getting = flushed roughly every 5-8 minutes which is hardly optimal, particularly = given I have JVM memory to spare at the moment. I am trying to = understand the Cassandra logs but the numbers I am seeing are not making = any sense.

 

The initial memtable settings for this CF were = throughput =3D 70 MB and operations =3D 0.7  million. The flush = messages I was seeing in the logs (after a “flushing high-traffic = column family” message for this CF) looked like:

         &= nbsp;      “Enqueuing flush of = Memtable-.... (17203504/600292480 serialized/live bytes, 320432 = ops)”

 

So... uh... ~17 MB serialized, ~600 MB live (whatever = that means), and ~320k ops; the resulting sstables are ~34 MB. This is = roughly what every flush looks like. Two minutes before this particular = flush, GC triggering the StatusLogger shows ops and data for the CF as = “122592,230094268” or 122k ops (sensible) and 230 MB = (what???). For at least 2 minutes prior to THAT message, nothing else = happened (flushes, compaction, etc) for any column family which means = that this series of events (flush to gc log entry to flush) is = reasonably isolated from any other activity.

 

None of = these numbers look even *remotely* close to 70 MB (the = memtable_throughput setting). Anyway, via JMX I went in and changed = throughput to 200 MB and operations to 0.5. This did *absolutely = nothing* to the flush behaviour: still ~17 MB serialized, ~600 MB = live ~320k ops, ~34 MB sstables, and flushes every 5-8 minutes (I waited = for a few flushes in case the change took some time to be applied). I = also tried changing the operations threshold to 0.2 million which DID = work so it’s not a case of the settings not being = respected.

 

WTF is going on? What is deciding that a flush is = necessary and where are all of these crazy size discrepancies coming = from? Some additional info and things to point out:

·         = I am NOT seeing “the heap is X = full, Cassandra will now flush the two largest memtables warnings” = or any other errors/unexpected things

·         = The sum of memtable_throughput across all = 10 CFs is 770 MB, well less than the default global memtable threshold = of ~4GB on a 12 GB java heap.

·         = There are no major compactions running on = this machine and no repairs running across the cluster

·         = Hinted handoff is = disabled

 

Any insight would be appreciated.

 

Dan = Hendry

------=_NextPart_000_0080_01CC5DBD.991516C0--