Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 941209ED8 for ; Wed, 5 Oct 2011 09:06:06 +0000 (UTC) Received: (qmail 10332 invoked by uid 500); 5 Oct 2011 09:06:04 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 10308 invoked by uid 500); 5 Oct 2011 09:06:04 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 10300 invoked by uid 99); 5 Oct 2011 09:06:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Oct 2011 09:06:04 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=HTML_FONT_FACE_BAD,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a52.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Oct 2011 09:05:58 +0000 Received: from homiemail-a52.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a52.g.dreamhost.com (Postfix) with ESMTP id 902646B80F1 for ; Wed, 5 Oct 2011 02:05:36 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=subject :mime-version:content-type:from:in-reply-to:date:message-id :references:to; q=dns; s=thelastpickle.com; b=CrwdILTwpHp7HcZMrl Oc2NWhpsDv2hgEUlGqJpGCEi5b9NA+xvvOuq9xuHN+4FdD7sUJCRUuhFSsb8hhWO Xqmvz7GaE7KIIh5FHYs21Pw8QtegQYF/OaRuhnnwFYYul4S+wI5SZrEob6otltFG /96gA7rtnF/l5r/jC2x7wWZlQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= subject:mime-version:content-type:from:in-reply-to:date :message-id:references:to; s=thelastpickle.com; bh=r0JfHPlzFk1f4 Xy/68I9NW3cBlo=; b=bZZ3NYHuJYg2KnbybtGTRPEqUq9HT+Nl1hCH6pjBQJnkQ lkiItXwe6EfvmoVt17XSL2CZhzoT34oIPSltepc99Qk/CZOxcvMK52kmp9U4BljL kGKUKbFZj4Lxkr+phcG1ZJLeLppND+mhJqozjrEZflKeUymmaCzs3wZaH8KqvM= Received: from [172.16.1.4] (222-152-101-125.jetstream.xtra.co.nz [222.152.101.125]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a52.g.dreamhost.com (Postfix) with ESMTPSA id 98EA06B8056 for ; Wed, 5 Oct 2011 02:05:35 -0700 (PDT) Subject: Re: Why is mutation stage increasing ?? Mime-Version: 1.0 (Apple Message framework v1244.3) Content-Type: multipart/alternative; boundary="Apple-Mail=_56FA68AC-6787-4FE0-92F2-0D97F92171A1" From: aaron morton X-Priority: Normal In-Reply-To: <1653881448-1317804754-cardhu_decombobulator_blackberry.rim.net-563979464-@b4.c2.bise9.blackberry> Date: Wed, 5 Oct 2011 22:05:32 +1300 Message-Id: References: <1653881448-1317804754-cardhu_decombobulator_blackberry.rim.net-563979464-@b4.c2.bise9.blackberry> To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1244.3) --Apple-Mail=_56FA68AC-6787-4FE0-92F2-0D97F92171A1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Lots of hinted handoff can give you mutations=E2=80=A6 > HintedHandoff 0 0 1798 0 = 0 1798 is somewhat high. This is the HH tasks on this node though, can you = see HH running on other nodes in the cluster? What has been happening on = this node ?=20 HH is throttled to avoid this sort of thing, what version are you on ?=20= Also looks like the disk IO could not keep up with the flushing=E2=80=A6. FlushWriter 0 0 5714 0 = 499 =20 You need to provide some more info on what was happening to nodes before = hand. And check the logs on all machines for errors etc.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 5/10/2011, at 9:52 PM, Yi Yang wrote: > Well what client are you using? And can you give a hint to your node = hardware? > =E5=BE=9E=E6=88=91=E7=9A=84 BlackBerry=C2=AE =E7=84=A1=E7=B7=9A=E8=A3=9D= =E7=BD=AE >=20 > From: Philippe > Date: Wed, 5 Oct 2011 10:33:21 +0200 > To: user > ReplyTo: user@cassandra.apache.org > Subject: Why is mutation stage increasing ?? >=20 > Hello, > I have my 3-node, RF=3D3 cluster acting strangely. Can someone shed a = light as to what is going on ? > It was stuck for a couple of hours (all clients TimedOut). nodetool = tpstats showed huge increasing MutationStages (in the hundreds of = thousands). > I restarted one node and it took a while to reply GB of commitlog. = I've shutdown all clients that write to the cluster and it's just weird >=20 > All nodes are still showing huge MutationStages including the new one = and it's either increasing or stable. The pending count is stuck at 32. > Compactionstats shows no compaction on 2 nodes and dozens of Scrub = compactions (all at 100%) on the 3rd one. This is a scrub I did last = week when I encountered assertion errors. > Netstats shows no streams being exchanged at any node but each on is = expecting a few Responses. >=20 > Any ideas ? > Thanks >=20 > For example (increased to 567062 while I was writing this email) > Pool Name Active Pending Completed Blocked = All time blocked > ReadStage 0 0 18372664517 0 = 0 > RequestResponseStage 0 0 10731370183 0 = 0 > MutationStage 32 565879 295492216 0 = 0 > ReadRepairStage 0 0 23654 0 = 0 > ReplicateOnWriteStage 0 0 7733659 0 = 0 > GossipStage 0 0 3502922 0 = 0 > AntiEntropyStage 0 0 1631 0 = 0 > MigrationStage 0 0 0 0 = 0 > MemtablePostFlusher 0 0 5716 0 = 0 > StreamStage 0 0 10 0 = 0 > FlushWriter 0 0 5714 0 = 499 > FILEUTILS-DELETE-POOL 0 0 773 0 = 0 > MiscStage 0 0 1266 0 = 0 > FlushSorter 0 0 0 0 = 0 > AntiEntropySessions 0 0 18 0 = 0 > InternalResponseStage 0 0 0 0 = 0 > HintedHandoff 0 0 1798 0 = 0 >=20 >=20 > Mode: Normal > Not sending any streams. > Not receiving any streams. > Pool Name Active Pending Completed > Commands n/a 0 1223769753 > Responses n/a 4 1627481305 >=20 --Apple-Mail=_56FA68AC-6787-4FE0-92F2-0D97F92171A1 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
1798 is somewhat high. This is the HH = tasks on this node though, can you see HH running on other nodes in the = cluster? What has been happening on this node = ? 

HH is throttled to avoid = this sort of thing, what version are you on = ? 

Also looks like the disk IO could not = keep up with the = flushing=E2=80=A6.

 
You need = to provide some more info on what was happening to nodes before hand. = And check the logs on all machines for errors = etc. 

Cheers

------= -----------
Aaron = Morton
Freelance Cassandra = Developer

On 5/10/2011, at 9:52 PM, Yi Yang wrote:

Well what = client are you using? And can you give a hint to your node = hardware?

=E5=BE=9E=E6=88=91=E7=9A=84 BlackBerry=C2=AE = =E7=84=A1=E7=B7=9A=E8=A3=9D=E7=BD=AE


From: Philippe = <watcherfr@gmail.com>
Date: Wed, 5 Oct 2011 10:33:21 +0200
Subject: Why is mutation stage increasing = ??

Hello,
I have my 3-node, RF=3D3 cluster = acting strangely. Can someone shed a light as to what is going on = ?
It was stuck for a couple of hours (all clients TimedOut). nodetool = tpstats showed huge increasing MutationStages (in the hundreds of = thousands).
I restarted one node and it took a while to reply GB of commitlog. I've = shutdown all clients that write to the cluster and it's just = weird

All nodes are still showing huge MutationStages including = the new one and it's either increasing or stable. The pending count is = stuck at 32.
Compactionstats shows no compaction on 2 nodes and dozens of Scrub = compactions (all at 100%) on the 3rd one. This is a scrub I did last = week when I encountered assertion errors.
Netstats shows no streams = being exchanged at any node but each on is expecting a few = Responses.

Any ideas ?
Thanks

For example (increased to 567062 while = I was writing this email)
Pool = Name           &nbs= p;        Active   = Pending      Completed   = Blocked  All time blocked
ReadStage        &= nbsp;           &nb= sp;    0         = 0    = 18372664517         = 0            &= nbsp;    0
RequestResponseStage      &n= bsp;       = 0         0    = 10731370183         = 0            &= nbsp;    0
MutationStage       &nb= sp;            = 32    565879      = 295492216         = 0            &= nbsp;    0
ReadRepairStage       &= nbsp;           = 0         = 0          = 23654         = 0            &= nbsp;    0
ReplicateOnWriteStage      &= nbsp;      = 0         = 0        = 7733659         = 0            &= nbsp;    0
GossipStage        = ;            &= nbsp;  0         = 0        = 3502922         = 0            &= nbsp;    0
AntiEntropyStage       =            = 0         = 0           = 1631         = 0            &= nbsp;    0
MigrationStage       &n= bsp;            = 0         = 0            &= nbsp; 0         = 0            &= nbsp;    0
MemtablePostFlusher      &nb= sp;        = 0         = 0           = 5716         = 0            &= nbsp;    0
StreamStage        = ;            &= nbsp;  0         = 0             = 10         = 0            &= nbsp;    0
FlushWriter        = ;            &= nbsp;  0         = 0           = 5714         = 0            &= nbsp;  499
FILEUTILS-DELETE-POOL      &= nbsp;      = 0         = 0            = 773         = 0            &= nbsp;    0
MiscStage        &= nbsp;           &nb= sp;    0         = 0           = 1266         = 0            &= nbsp;    0
FlushSorter        = ;            &= nbsp;  0         = 0            &= nbsp; 0         = 0            &= nbsp;    0
AntiEntropySessions      &nb= sp;        = 0         = 0             = 18         = 0            &= nbsp;    0
InternalResponseStage      &= nbsp;      = 0         = 0            &= nbsp; 0         = 0            &= nbsp;    0
HintedHandoff       &nb= sp;            = ; 0         = 0           = 1798         = 0            &= nbsp;    0


Mode: Normal
Not sending any = streams.
Not receiving any = streams.
Pool = Name           &nbs= p;        Active   = Pending      Completed
Commands        &n= bsp;           &nbs= p;   n/a         = 0     1223769753
Responses        &= nbsp;           &nb= sp;  n/a         = 4     1627481305


= --Apple-Mail=_56FA68AC-6787-4FE0-92F2-0D97F92171A1--