Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 887ED7F2E for ; Tue, 23 Aug 2011 08:44:22 +0000 (UTC) Received: (qmail 93406 invoked by uid 500); 23 Aug 2011 08:44:14 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 92553 invoked by uid 500); 23 Aug 2011 08:43:56 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 92527 invoked by uid 99); 23 Aug 2011 08:43:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Aug 2011 08:43:54 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a58.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Aug 2011 08:43:48 +0000 Received: from homiemail-a58.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a58.g.dreamhost.com (Postfix) with ESMTP id 4B9777D805B for ; Tue, 23 Aug 2011 01:43:22 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=content-type :mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; q=dns; s= thelastpickle.com; b=G9yrQtU++zJfm1jv/DFO7ta1apBb8ObSsn3UUEXQiYT q8MAUqbNUXO54Bwwl2k8jRWDNYj2Vo16vy7IAAzbfAEx39jtQCOGKt2hiUrcRaDr HvNqJmCZW1Y1waUOfwk7EuS9U6PxAXDX2Cb4Sp437BEo3dud0w7bnBoj2xsgE2JU = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; s= thelastpickle.com; bh=GOUumXFEgDabiu8tZkeYVnUp2PY=; b=WH4ENZdyx3 9pmz8lItcLp9ZkWIW2TC8a0s1sGbi9GYk/gCvtHrGgo07vASz1DvX22PI65y9HjV Reocpn2BOk8NLPbHm10UqPKKFW8Utv5Y54a0wakYjxFNcXpJzS/TkkCK0rkhK1FH AV08uP8Hbha9qeWfXGyuglT8ZN1wmSZP8= Received: from [172.16.1.4] (222-152-100-67.jetstream.xtra.co.nz [222.152.100.67]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a58.g.dreamhost.com (Postfix) with ESMTPSA id AE7F17D8058 for ; Tue, 23 Aug 2011 01:43:21 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Apple Message framework v1244.3) Subject: Re: 4/20 nodes get disproportionate amount of mutations From: aaron morton In-Reply-To: <549D0947-C093-4C7F-95A9-1786F84166B1@gmail.com> Date: Tue, 23 Aug 2011 20:43:17 +1200 Content-Transfer-Encoding: quoted-printable Message-Id: <76938F68-F558-424F-9079-3E32BA376360@thelastpickle.com> References: <4701A184-927B-4F07-96A6-7919F73AA110@gmail.com> <549D0947-C093-4C7F-95A9-1786F84166B1@gmail.com> To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1244.3) Dropped messages in ReadRepair is odd. Are you also dropping mutations ?=20= There are two tasks performed on the ReadRepair stage. The digests are = compared on this stage, and secondly the repair happens on the stage. = Comparing digests is quick. Doing the repair could take a bit longer, = all the cf's returned are collated, filtered and deletes removed. =20 We don't do background Read Repair on range scans, they do have = foreground digest checking though. What CL are you using ?=20 begin crazy theory: Could there be a very big row that is out of sync ? The = increased RR would be resulting in mutations been sent back to the = replicas. Which would give you a hot spot in mutations. =09 Check max compacted row size on the hot nodes.=20 =09 Turn the logging up to DEBUG on the hot machines for = o.a.c.service.RowRepairResolver and look for the "resolve:=85" message = it has the time taken. Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/08/2011, at 7:52 PM, Jeremy Hanna wrote: >=20 > On Aug 23, 2011, at 2:25 AM, Peter Schuller wrote: >=20 >>> We've been having issues where as soon as we start doing heavy = writes (via hadoop) recently, it really hammers 4 nodes out of 20. = We're using random partitioner and we've set the initial tokens for our = 20 nodes according to the general spacing formula, except for a few = token offsets as we've replaced dead nodes. >>=20 >> Is the hadoop job iterating over keys in the cluster in token order >> perhaps, and you're generating writes to those keys? That would >> explain a "moving hotspot" along the cluster. >=20 > Yes - we're iterating over all the keys of particular column families, = doing joins using pig as we enrich and perform measure calculations. = When we write, we're usually writing out for a certain small subset of = keys which shouldn't have hotspots with RandomPartitioner afaict. >=20 >>=20 >> --=20 >> / Peter Schuller (@scode on twitter) >=20