Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 77648 invoked from network); 23 May 2010 02:07:32 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 23 May 2010 02:07:32 -0000 Received: (qmail 69586 invoked by uid 500); 23 May 2010 02:07:31 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 69443 invoked by uid 500); 23 May 2010 02:07:31 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 69433 invoked by uid 99); 23 May 2010 02:07:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 May 2010 02:07:31 +0000 X-ASF-Spam-Status: No, hits=1.4 required=10.0 tests=AWL,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of isoboroff@gmail.com designates 209.85.221.192 as permitted sender) Received: from [209.85.221.192] (HELO mail-qy0-f192.google.com) (209.85.221.192) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 May 2010 02:07:24 +0000 Received: by qyk30 with SMTP id 30so3892242qyk.16 for ; Sat, 22 May 2010 19:07:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=6xbW8qSrVc/msAlhIyC0Sucux90K4qT+c4gIpeNbWgs=; b=YEnN0hAU9M+nQmrxc79QQVsL97KGz0SIwLmwPMc2+/Ukm5PY94GrS0ouAfGJ7gOCqM a2jRbfLVH0lT+BtdTURBJbp6LZzWqUortxAeGEE/+7m+YePLBHPJOoWoLcCnkI0iHc15 mSjOYdV3r5MsWT/bpzT6MVtVxlXdRUV6Wi9H4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=hYRZeXkd5u+JH/zpaWhGFOGBJv+6vx906UJEveX1Jakt2FmCNA9kKJKZaxJZrAnK4L xJqHHTLYoZ4hAnzRrkdy2Jr751ubm+gIkKXIiUjFoT+ceABbvq1QynSgwe0Dd5UXEtnk FwAZCfKx64AZRIQHnAePKZv7vXp5dEKFaTEms= MIME-Version: 1.0 Received: by 10.224.76.12 with SMTP id a12mr2270717qak.398.1274580423804; Sat, 22 May 2010 19:07:03 -0700 (PDT) Received: by 10.229.214.144 with HTTP; Sat, 22 May 2010 19:07:03 -0700 (PDT) In-Reply-To: References: Date: Sat, 22 May 2010 22:07:03 -0400 Message-ID: Subject: Re: Scaling problems From: Ian Soboroff To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00163646b5de6606a304873961cb --00163646b5de6606a304873961cb Content-Type: text/plain; charset=ISO-8859-1 I'll try this. HH backs up because nodes are failing. I haven't read the code, but why should HH suck CPU? As I understand it, there's nothing to hand off until the destination comes back up, and Gossip should tell us that, no? In the interim, it's just a cache of writes waiting to be sent. Is there some way to tell the system "Just stop caring, I'm just writing, let's worry about leveling out when I get around to wanting to read?" Ian On Fri, May 21, 2010 at 9:06 PM, Jonathan Ellis wrote: > On Fri, May 21, 2010 at 9:09 AM, Ian Soboroff wrote: > > HINTED-HANDOFF-POOL 1 158 23 > > this is your smoking gun. HH tasks suck a ton of CPU and you have 158 > backed up. > > i would just blow the HH files away from your data/system directory, > restart the node, and run repair (assuming all your other nodes are > alive again). > --00163646b5de6606a304873961cb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I'll try this.=A0 HH backs up because nodes are failing.=A0 I haven'= ;t read the code, but why should HH suck CPU?=A0 As I understand it, there&= #39;s nothing to hand off until the destination comes back up, and Gossip s= hould tell us that, no?=A0 In the interim, it's just a cache of writes = waiting to be sent.

Is there some way to tell the system "Just stop caring, I'm ju= st writing, let's worry about leveling out when I get around to wanting= to read?"

Ian

On Fri, May 21= , 2010 at 9:06 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
On Fri, May 21, 2010 at 9:09 AM, Ian Soboroff <isoboroff@gmail.com> wrote:
> HINTED-HANDOFF-POOL=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 1=A0=A0= =A0=A0=A0=A0 158=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 23

this is your smoking gun. =A0HH tasks suck a ton of CPU and you have = 158
backed up.

i would just blow the HH files away from your data/system directory,
restart the node, and run repair (assuming all your other nodes are
alive again).

--00163646b5de6606a304873961cb--