Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9749F439E for ; Thu, 7 Jul 2011 21:34:18 +0000 (UTC) Received: (qmail 8167 invoked by uid 500); 7 Jul 2011 21:34:18 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 8142 invoked by uid 500); 7 Jul 2011 21:34:17 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 8132 invoked by uid 99); 7 Jul 2011 21:34:17 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jul 2011 21:34:17 +0000 Received: from localhost (HELO mail-iw0-f170.google.com) (127.0.0.1) (smtp-auth username phunt, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jul 2011 21:34:17 +0000 Received: by iwn36 with SMTP id 36so1376774iwn.15 for ; Thu, 07 Jul 2011 14:34:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.148.9 with SMTP id p9mr1334790icv.19.1310074456518; Thu, 07 Jul 2011 14:34:16 -0700 (PDT) Received: by 10.42.230.2 with HTTP; Thu, 7 Jul 2011 14:34:16 -0700 (PDT) In-Reply-To: <4E162396.8080306@nicira.com> References: <4E15EF86.90200@nicira.com> <4E162396.8080306@nicira.com> Date: Thu, 7 Jul 2011 14:34:16 -0700 Message-ID: Subject: Re: flood of "outstanding proposal" messages From: Patrick Hunt To: Jeremy Stribling Cc: user@zookeeper.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable When you are stressing the system outstanding proposals can backup, the entirety of which will be logged at TRACE level each time an ack is processed. Alone this should not cause any problems other than the large logs you highlighted in your original post. Patrick On Thu, Jul 7, 2011 at 2:22 PM, Jeremy Stribling wrote: > Yes, thanks, we definitely will. =A0During this test we were trying to tr= ack > down some other problem we were having, which is why the debugging level = is > turned up. =A0Are you saying that whatever underlying issue caused these > hundreds of megabytes of messages would only happen in TRACE mode? > > On 07/07/2011 02:12 PM, Patrick Hunt wrote: >> >> Hi Jeremy, you're running with TRACE logging turned on which can be >> very verbose, I'd suggest you run with INFO level in normal >> circumstances. >> >> Patrick >> >> On Thu, Jul 7, 2011 at 10:40 AM, Jeremy Stribling >> =A0wrote: >> >>> >>> Has anyone ever seen Zookeeper explode with a flood of "outstanding >>> proposal" messages? =A0I'm running 3.3.3* with three nodes (under fairl= y >>> stressful client load), and all of the sudden one of the nodes' logs >>> starts >>> filling up with these messages: >>> >>> 2011-07-06 18:37:51,161 628684 [ProcessThread:-1] TRACE >>> org.apache.zookeeper.server.PrepRequestProcessor =A0- >>> :Psessionid:0x5d310234eaf10002 type:delete cxid:0x4e15574f >>> zxid:0xfffffffffffffffe txntype:unknown >>> >>> reqpath:/zkrsm/cpt-0000000000000009-000000000000afc8/0000000000000009_r= ecord0000003293 >>> 2011-07-06 18:37:51,162 628685 [ProcessThread:-1] DEBUG >>> org.apache.zookeeper.server.quorum.CommitProcessor =A0- Processing >>> request:: >>> sessionid:0x5d310234eaf10002 type:delete cxid:0x4e15574f zxid:0x4000012= 1c >>> txntype:2 reqpath:n/a >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- Ack zxid: 0x4000011fb >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001204 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x40000120d >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001207 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001215 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x4000011fe >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001208 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001214 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001219 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001206 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001201 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001213 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x4000011ff >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x40000121a >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001200 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x40000121b >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001209 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001212 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x4000011fc >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001211 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001203 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x40000120a >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x4000011fd >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001210 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001218 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001202 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x40000120b >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001216 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x40000120f >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x400001217 >>> 2011-07-06 18:37:51,162 628685 [SyncThread:177] TRACE >>> org.apache.zookeeper.server.quorum.Leader =A0- outstanding proposal: >>> 0x40000120c >>> >>> It goes on like that for hundreds of megabytes over a few minutes, unti= l >>> we >>> killed the test. =A0As far as I can tell, there were no node churn even= ts >>> that >>> sparked this. >>> >>> I can provide the full set of logs and/or file a JIRA, but I thought I'= d >>> do >>> a quick check first to see if this sparked anyone's memory. >>> >>> Thanks, >>> >>> Jeremy >>> >>> * I have applied a few patches on top of 3.3.3, namely a homegrown, >>> temporary patch for ZOOKEEPER-1046 and one for ZOOKEEPER-1060. >>> >>> >>> >