Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 61B6D7818 for ; Tue, 23 Aug 2011 06:41:12 +0000 (UTC) Received: (qmail 46380 invoked by uid 500); 23 Aug 2011 06:41:08 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 45990 invoked by uid 500); 23 Aug 2011 06:40:54 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 45970 invoked by uid 99); 23 Aug 2011 06:40:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Aug 2011 06:40:51 +0000 X-ASF-Spam-Status: No, hits=-0.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jeremy.hanna1234@gmail.com designates 209.85.213.44 as permitted sender) Received: from [209.85.213.44] (HELO mail-yw0-f44.google.com) (209.85.213.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Aug 2011 06:40:43 +0000 Received: by ywe9 with SMTP id 9so1072191ywe.31 for ; Mon, 22 Aug 2011 23:40:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:content-type:content-transfer-encoding:subject:date:message-id :to:mime-version:x-mailer; bh=KQipJ8DZFPzOdWIWRDha9V3yLy2lA9qlcOgo/SnaOKk=; b=VWFBK5suBYq3dq+5UiLZAqqXVjyM58zWyWGWp9KiR3TwRQR4Bt43pBTKjBKIZnvmqG PdNbSROZOW9RIQHXGS41KgNBzbEMSij2N00TtErMv8MWkFRtorqBZxue2sAU0A28y4Xw bxgZr2KdD6wnUI4JigqCCl5U3n6hi+xasC0Ss= Received: by 10.236.139.229 with SMTP id c65mr20202784yhj.35.1314081622993; Mon, 22 Aug 2011 23:40:22 -0700 (PDT) Received: from [192.168.1.69] (108-90-0-32.lightspeed.austtx.sbcglobal.net [108.90.0.32]) by mx.google.com with ESMTPS id y6sm1826670yhl.40.2011.08.22.23.40.21 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 22 Aug 2011 23:40:22 -0700 (PDT) From: Jeremy Hanna Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: 4/20 nodes get disproportionate amount of mutations Date: Tue, 23 Aug 2011 01:40:20 -0500 Message-Id: <4701A184-927B-4F07-96A6-7919F73AA110@gmail.com> To: user@cassandra.apache.org Mime-Version: 1.0 (Apple Message framework v1244.3) X-Mailer: Apple Mail (2.1244.3) We've been having issues where as soon as we start doing heavy writes = (via hadoop) recently, it really hammers 4 nodes out of 20. We're using = random partitioner and we've set the initial tokens for our 20 nodes = according to the general spacing formula, except for a few token offsets = as we've replaced dead nodes. When I say hammers, I look at nodetool tpstats: those 4 nodes have = completed something like 70 million mutation stage events whereas the = rest of the cluster have completed from 2-20 million mutation stage = events. Therefore, on the 4 nodes, we find in the logs there is = evidence of backing up in the mutation stage and a lot of read repair = message drops. It looks like there is quite a bit of flushing is going = on and consequently auto minor compactions. We are running 0.7.8 and have about 34 column families (when counting = secondary indexes as column families) so we can't get too large with our = memtable throughput in mb. We would like to upgrade to 0.8.4 (not least = because of JAMM) but it seems that something else is going on with our = cluster if we are using RP and balanced initial tokens and still have 4 = hot nodes. Do these symptoms and context sound familiar to anyone? Does anyone = have any suggestions as to how to address this kind of case - = disproportionate write load? Thanks, Jeremy=