Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AF5C418B8A for ; Wed, 18 Nov 2015 10:37:21 +0000 (UTC) Received: (qmail 36397 invoked by uid 500); 18 Nov 2015 10:37:21 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 36344 invoked by uid 500); 18 Nov 2015 10:37:21 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 36333 invoked by uid 99); 18 Nov 2015 10:37:20 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Nov 2015 10:37:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 4982E180A4D for ; Wed, 18 Nov 2015 10:37:20 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3 X-Spam-Level: *** X-Spam-Status: No, score=3 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id xyNBPXXfjX1w for ; Wed, 18 Nov 2015 10:37:06 +0000 (UTC) Received: from nk11p18im-asmtp002.me.com (nk11p18im-asmtp002.me.com [17.158.120.161]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 316DF43DB6 for ; Wed, 18 Nov 2015 10:37:06 +0000 (UTC) Received: from akmals-mbp.lan (ua-83-227-15-85.cust.bredbandsbolaget.se [83.227.15.85]) by nk11p18im-asmtp002.me.com (Oracle Communications Messaging Server 7.0.5.35.0 64bit (built Mar 31 2015)) with ESMTPSA id <0NY000CRCATPV550@nk11p18im-asmtp002.me.com> for user@zookeeper.apache.org; Wed, 18 Nov 2015 10:37:05 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2015-11-18_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 kscore.is_bulkscore=0 kscore.compositescore=1 compositescore=0.9 suspectscore=4 phishscore=0 bulkscore=0 kscore.is_spamscore=0 rbsscore=0 spamscore=0 urlsuspectscore=0.9 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1510090000 definitions=main-1511180169 From: Akmal Abbasov Content-type: multipart/alternative; boundary="Apple-Mail=_1C36DD9E-70F9-43D0-963A-24223C14E0AA" Message-id: <04FC096E-EC96-4694-BCD9-8DA0609F03B2@icloud.com> MIME-version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Transaction timeouts Date: Wed, 18 Nov 2015 11:37:01 +0100 References: <39D690DF-62C9-4923-8AC2-7EE609A0BEC4@icloud.com> To: user@zookeeper.apache.org In-reply-to: X-Mailer: Apple Mail (2.2104) --Apple-Mail=_1C36DD9E-70F9-43D0-963A-24223C14E0AA Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 17 Nov 2015, at 21:34, Ra=C3=BAl Guti=C3=A9rrez Segal=C3=A9s = wrote: >=20 > On 17 November 2015 at 12:13, Akmal Abbasov > wrote: >=20 >> Hi Raul, >> Thank you for your response. >> I am running zookeeper with -Xms512m -Xmx1g options, is this enough. >>=20 >=20 > It depends on your workload.. how many writes/read per sec are you > expecting/seeing? Are you seeing long > GC pauses? If so, you'll need more mem or bigger tick times, otherwise > you'll miss the deadlines for the > pings (both among learners and to clients=E2=80=A6) >=20 Where I can find this information, in fact information regarding = read/writes.=20 This is the output of the stat command Server 1 Latency min/avg/max: 0/66/5212 Received: 8722 Sent: 8694 Connections: 19 Outstanding: 0 Zxid: 0xa9600002ef2 Mode: follower Node count: 479 Server 2=20 Latency min/avg/max: 0/70/5252 Received: 8228 Sent: 8203 Connections: 16 Outstanding: 0 Zxid: 0xa9600002e12 Mode: leader Node count: 479 Server 3 Latency min/avg/max: 0/0/1 Received: 140 Sent: 139 Connections: 2 Outstanding: 0 Zxid: 0xa9600002bf8 Mode: follower Node count: 479 All the servers have the same configs.=20 Is -Xms512m -Xmx1g enough to handle my workload. Moreover I see that the load is not evenly distributed. Is it something = that should be tuned manually, or there is something like hbase/hdfs balancer, which will take care of = this? >=20 >> Regarding the network, all of the server zk server nodes are hosted = in the >> cloud, in the same dc. >> But according to the zk troubleshooting guide, the timeout should be >> increased for cloud environments. >>=20 >=20 > Yup, latency can be unpredictable in the cloud=E2=80=A6 >=20 >=20 >> One more thing is that, I=E2=80=99m seeing a lot of >> fsync-ing the write ahead log in SyncThread:1 took 2962ms which will >> adversely effect operation latency. See the ZooKeeper troubleshooting = guide >> messages in the logs. >>=20 >=20 > That definitely looks bad and will block everything else. What type of = disc > are you writing your logs and snapshots to? Are they > separate volumes? I=E2=80=99m using separate disk for both logs and data. But they=E2=80=99r= e hdd, not ssd.=20 So my assumption=20 I=E2=80=99ve tried to understand what actually is happening, here is the = summary of the logs 08:22:08,201 Transaction timeout 08:22:08,596 - 08:22:25,441 ZookeeperServer not running 08:22:24,927 New election Everything is starting from =E2=80=99Transaction timeout=E2=80=99 in = leader, which caused =E2=80=98Exception when following the leader=E2=80=99= in learners. Then all zookeeper processes are shutting down. New election is = happening and zookeeper processes are starting.=20 And one more thing, what=E2=80=99s the best way to update the configs = without downtime. Thank you. Regards, Akmal =09 --Apple-Mail=_1C36DD9E-70F9-43D0-963A-24223C14E0AA--