Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3A023815C for ; Thu, 18 Aug 2011 21:00:37 +0000 (UTC) Received: (qmail 59527 invoked by uid 500); 18 Aug 2011 21:00:34 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 59264 invoked by uid 500); 18 Aug 2011 21:00:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 59256 invoked by uid 99); 18 Aug 2011 21:00:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2011 21:00:33 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dan.hendry.junk@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vw0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2011 21:00:26 +0000 Received: by vws12 with SMTP id 12so2310743vws.31 for ; Thu, 18 Aug 2011 14:00:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:to:references:in-reply-to:subject:date:message-id:mime-version :content-type:content-transfer-encoding:x-mailer:thread-index :content-language; bh=Uij7m+OKgpiwL8wpDeMjkZrhrW6RIvyyTxjzi/ZyvM0=; b=hmXQ/XFlufgFXPz/NCPQplYgTdtxf0VTgc9BQ/qcA7R2JxvTfqzggoVXvXcEGh5AHP 6Yvbp21P9WSVBP9y78dWgYNQey1RXPOx6hRXZiIKyaKDB3lMNCISeugCeZZbTePgkLYb om9I1ifaSML16S6e4bHCVD434tjckazLN57vs= Received: by 10.52.112.201 with SMTP id is9mr1239947vdb.379.1313701206050; Thu, 18 Aug 2011 14:00:06 -0700 (PDT) Received: from DHTABLET ([216.16.242.198]) by mx.google.com with ESMTPS id eq10sm1804493vdb.40.2011.08.18.14.00.04 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 18 Aug 2011 14:00:05 -0700 (PDT) From: "Dan Hendry" To: References: <4e4d6b8c.8949340a.72cd.1bc2@mx.google.com> In-Reply-To: Subject: RE: Memtable flush thresholds - what am I missing? Date: Thu, 18 Aug 2011 16:59:29 -0400 Message-ID: <4e4d7d55.8a63340a.4a20.74eb@mx.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Acxd4DuMCFAgn2esRs2BFAD4uFL+OwABki9w Content-Language: en-ca X-Virus-Checked: Checked by ClamAV on apache.org Interesting. Just to clarify, there are three main conditions which will trigger a = flush (based on data size): 1. The serialized size of a memtable exceeds the per CF = memtable_throughput setting. 2. For a single cf: (serialized size)*(live ratio)*(maximum possible memtables in memory) > memtable_total_space_in_mb 3. sum_all_cf((serialized size)*(live ratio)) > = memtable_total_space_in_mb This makes a lot of sense to me, particularly in comparison to the 0.7 = era when the java overhead was not considered.=20 The fact that memtable_total_space_in_mb and memtable_throughput (in MB) = are actually referring to different megabytes (live vs serialized) is pretty confusing and should really be made more explicit in the cli and/or cassandra.yaml. Dan -----Original Message----- From: Jonathan Ellis [mailto:jbellis@gmail.com]=20 Sent: August-18-11 15:51 To: user@cassandra.apache.org Subject: Re: Memtable flush thresholds - what am I missing? See http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/, specifically the section on memtable_total_space_in_mb On Thu, Aug 18, 2011 at 2:43 PM, Dan Hendry wrote: > I am in the process of trying to tune the memtable flush thresholds = for a > particular column family (super column family to be specific) in my > Cassandra 0.8.1 cluster. This CF is reasonably heavily used and = getting > flushed roughly every 5-8 minutes which is hardly optimal, = particularly > given I have JVM memory to spare at the moment. I am trying to = understand > the Cassandra logs but the numbers I am seeing are not making any = sense. > > > > The initial memtable settings for this CF were throughput =3D 70 MB = and > operations =3D 0.7 =A0million. The flush messages I was seeing in the = logs > (after a =93flushing high-traffic column family=94 message for this = CF) looked > like: > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 =93Enqueuing flush of = Memtable-.... (17203504/600292480 > serialized/live bytes, 320432 ops)=94 > > > > So... uh... ~17 MB serialized, ~600 MB live (whatever that means), and ~320k > ops; the resulting sstables are ~34 MB. This is roughly what every = flush > looks like. Two minutes before this particular flush, GC triggering = the > StatusLogger shows ops and data for the CF as =93122592,230094268=94 = or 122k ops > (sensible) and 230 MB (what???). For at least 2 minutes prior to THAT > message, nothing else happened (flushes, compaction, etc) for any = column > family which means that this series of events (flush to gc log entry = to > flush) is reasonably isolated from any other activity. > > > > None of these numbers look even *remotely* close to 70 MB (the > memtable_throughput setting). Anyway, via JMX I went in and changed > throughput to 200 MB and operations to 0.5. This did *absolutely = nothing* to > the flush behaviour: still ~17 MB serialized, ~600 MB live ~320k ops, = ~34 MB > sstables, and flushes every 5-8 minutes (I waited for a few flushes in case > the change took some time to be applied). I also tried changing the > operations threshold to 0.2 million which DID work so it=92s not a = case of the > settings not being respected. > > > > WTF is going on? What is deciding that a flush is necessary and where = are > all of these crazy size discrepancies coming from? Some additional = info and > things to point out: > > =B7=A0=A0=A0=A0=A0=A0=A0=A0 I am NOT seeing =93the heap is X full, = Cassandra will now flush the > two largest memtables warnings=94 or any other errors/unexpected = things > > =B7=A0=A0=A0=A0=A0=A0=A0=A0 The sum of memtable_throughput across all = 10 CFs is 770 MB, well > less than the default global memtable threshold of ~4GB on a 12 GB = java > heap. > > =B7=A0=A0=A0=A0=A0=A0=A0=A0 There are no major compactions running on = this machine and no > repairs running across the cluster > > =B7=A0=A0=A0=A0=A0=A0=A0=A0 Hinted handoff is disabled > > > > Any insight would be appreciated. > > > > Dan Hendry --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com No virus found in this incoming message. Checked by AVG - www.avg.com=20 Version: 9.0.901 / Virus Database: 271.1.1/3842 - Release Date: 08/18/11 02:34:00