Return-Path: X-Original-To: apmail-hama-dev-archive@www.apache.org Delivered-To: apmail-hama-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2E67118FC3 for ; Wed, 8 Jul 2015 05:21:18 +0000 (UTC) Received: (qmail 12552 invoked by uid 500); 8 Jul 2015 05:21:18 -0000 Delivered-To: apmail-hama-dev-archive@hama.apache.org Received: (qmail 12526 invoked by uid 500); 8 Jul 2015 05:21:18 -0000 Mailing-List: contact dev-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list dev@hama.apache.org Received: (qmail 12515 invoked by uid 99); 8 Jul 2015 05:21:17 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jul 2015 05:21:17 +0000 Received: from mail-ie0-f182.google.com (mail-ie0-f182.google.com [209.85.223.182]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 982731A024F for ; Wed, 8 Jul 2015 05:21:17 +0000 (UTC) Received: by iecuq6 with SMTP id uq6so148619643iec.2 for ; Tue, 07 Jul 2015 22:21:16 -0700 (PDT) X-Gm-Message-State: ALoCoQl9lCB8ePkrbSFiRIKPI9LkQ7e0wMYVax4BFJNpLJiosCvai8gVhWKaRpHCPqOI49atYnlb MIME-Version: 1.0 X-Received: by 10.107.168.150 with SMTP id e22mr13773333ioj.9.1436332876534; Tue, 07 Jul 2015 22:21:16 -0700 (PDT) Received: by 10.64.111.197 with HTTP; Tue, 7 Jul 2015 22:21:16 -0700 (PDT) In-Reply-To: References: <007001d0b89d$90c9e0b0$b25da210$@samsung.com> Date: Wed, 8 Jul 2015 14:21:16 +0900 Message-ID: Subject: Re: Need to increase the default number of connections to zookeeper From: "Edward J. Yoon" To: "dev@hama.apache.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Just FYI, I just committed below: Index: conf/hama-default.xml =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- conf/hama-default.xml (revision 1689791) +++ conf/hama-default.xml (working copy) @@ -262,7 +262,7 @@ hama.zookeeper.property.maxClientCnxns - 30 + 100 Property from ZooKeeper's config zoo.cfg. Limit on number of concurrent connections (at the socket level) that a On Tue, Jul 7, 2015 at 9:17 PM, Minho Kim wrote: > Oops, > I made a mistake. Edward is right. Each node has 192G RAM. > > Thanks, > Minho Kim > > 2015-07-07 19:50 GMT+09:00 Edward J. Yoon : > >> > - 8 GB RAM >> >> I guess it looks like a typo Minho. :-) AFAIK, each node has 192GB memor= y. >> >> +1 we need to increase the default maxClientCnxns since modern >> machines have enough RAM. >> >> On Tue, Jul 7, 2015 at 7:13 PM, =EA=B9=80=EB=AF=BC=ED=98=B8 wrote: >> > Hi all, >> > >> > >> > >> > Recently, I set up Hama cluster using 2 machines. >> > >> > This specification is as follows: >> > >> > - 8 GB RAM >> > >> > - 12 TB HDD >> > >> > - (I don=E2=80=99t remember CPU spec.) >> > >> > >> > >> > In order to run hama job, I set up configuration, bsp.tasks.maximum=3D= 40 >> and >> > bsp.child.java.opts=3D-Xmx4096m, in hama-site.xml. (skip rests of >> settings.) >> > >> > So I performed examples which are pi Estimator and FastGraphGen but I = got >> > below errors. >> > >> > >> > >> > attempt_201507071627_0001_000023_0: >> > org.apache.zookeeper.KeeperException$ConnectionLossException: >> > KeeperErrorCode =3D ConnectionLoss for >> > /bsp/job_201507071627_0001/peers/cluster-0:61029 >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.hama.bsp.sync.ZKSyncClient.isExists(ZKSyncClient.java:108) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:261) >> > >> > attempt_201507071627_0001_000023_0: at >> > >> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.registerTask(ZooKeeperS= yncC >> > lientImpl.java:279) >> > >> > attempt_201507071627_0001_000023_0: at >> > >> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.register(ZooKeeperSyncC= lien >> > tImpl.java:261) >> > >> > attempt_201507071627_0001_000023_0: at >> org.apache.hama.bsp.BSPPeerImpl. >> > initializeSyncService(BSPPeerImpl.java:305) >> > >> > attempt_201507071627_0001_000023_0: at >> org.apache.hama.bsp.BSPPeerImpl. >> > (BSPPeerImpl.java:185) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:125= 1) >> > >> > attempt_201507071627_0001_000023_0: 15/07/07 16:27:40 ERROR >> > sync.ZKSyncClient: Error creating zk path >> > /bsp/job_201507071627_0001/peers/cluster-0:61029 >> > >> > attempt_201507071627_0001_000023_0: >> > org.apache.zookeeper.KeeperException$ConnectionLossException: >> > KeeperErrorCode =3D ConnectionLoss for /bsp >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.hama.bsp.sync.ZKSyncClient.createZnode(ZKSyncClient.java:13= 5) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:281) >> > >> > attempt_201507071627_0001_000023_0: at >> > >> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.registerTask(ZooKeeperS= yncC >> > lientImpl.java:279) >> > >> > attempt_201507071627_0001_000023_0: at >> > >> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.register(ZooKeeperSyncC= lien >> > tImpl.java:261) >> > >> > attempt_201507071627_0001_000023_0: at >> org.apache.hama.bsp.BSPPeerImpl. >> > initializeSyncService(BSPPeerImpl.java:305) >> > >> > attempt_201507071627_0001_000023_0: at >> org.apache.hama.bsp.BSPPeerImpl. >> > (BSPPeerImpl.java:185) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:125= 1) >> > >> > attempt_201507071627_0001_000023_0: 15/07/07 16:27:42 ERROR >> > sync.ZKSyncClient: Error checking zk path >> /bsp/job_201507071627_0001/sync/-1 >> > >> > attempt_201507071627_0001_000023_0: >> > org.apache.zookeeper.KeeperException$ConnectionLossException: >> > KeeperErrorCode =3D ConnectionLoss for /bsp/job_201507071627_0001/sync= /-1 >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.hama.bsp.sync.ZKSyncClient.isExists(ZKSyncClient.java:108) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:261) >> > >> > attempt_201507071627_0001_000023_0: at >> > >> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperS= yncC >> > lientImpl.java:100) >> > >> > attempt_201507071627_0001_000023_0: at >> org.apache.hama.bsp.BSPPeerImpl. >> > doFirstSync(BSPPeerImpl.java:312) >> > >> > attempt_201507071627_0001_000023_0: at >> org.apache.hama.bsp.BSPPeerImpl. >> > (BSPPeerImpl.java:238) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:125= 1) >> > >> > attempt_201507071627_0001_000023_0: 15/07/07 16:27:44 ERROR >> > sync.ZKSyncClient: Error creating zk path >> /bsp/job_201507071627_0001/sync/-1 >> > >> > attempt_201507071627_0001_000023_0: >> > org.apache.zookeeper.KeeperException$ConnectionLossException: >> > KeeperErrorCode =3D ConnectionLoss for /bsp >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.hama.bsp.sync.ZKSyncClient.createZnode(ZKSyncClient.java:13= 5) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:281) >> > >> > attempt_201507071627_0001_000023_0: at >> > >> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperS= yncC >> > lientImpl.java:100) >> > >> > attempt_201507071627_0001_000023_0: at >> org.apache.hama.bsp.BSPPeerImpl. >> > doFirstSync(BSPPeerImpl.java:312) >> > >> > attempt_201507071627_0001_000023_0: at >> org.apache.hama.bsp.BSPPeerImpl. >> > (BSPPeerImpl.java:238) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:125= 1) >> > >> > attempt_201507071627_0001_000023_0: 15/07/07 16:27:46 FATAL >> > bsp.GroomServer: SyncError from child >> > >> > attempt_201507071627_0001_000023_0: >> org.apache.hama.bsp.sync.SyncException >> > >> > attempt_201507071627_0001_000023_0: at >> > >> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperS= yncC >> > lientImpl.java:138) >> > >> > attempt_201507071627_0001_000023_0: at >> org.apache.hama.bsp.BSPPeerImpl. >> > doFirstSync(BSPPeerImpl.java:312) >> > >> > attempt_201507071627_0001_000023_0: at >> org.apache.hama.bsp.BSPPeerImpl. >> > (BSPPeerImpl.java:238) >> > >> > attempt_201507071627_0001_000023_0: at >> > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:125= 1) >> > >> > 15/07/07 16:27:48 INFO bsp.BSPJobClient: Job failed. >> > >> > >> > >> > This is a ZK error. Hama tasks try to get the /bsp node from zookeeper >> and >> > fails. >> > >> > This is just because hama.zookeeper.property.maxClientCnxns is 30 in >> hama- >> > default.xml. >> > >> > The problem has been encountered while the number of maximum tasks is >> > larger than it. >> > >> > To solve the problem, Hama has a setting to increase the number of >> > connectiosns to ZK. >> > >> > >> > >> > >> > >> > hama.zookeeper.property.maxClientCnxns >> > >> > 100 >> > >> > >> > >> > >> > >> > So we should update the default number of connections which is over 10= 0 >> > because server=E2=80=99s performance has been more improved than befor= e. >> > >> > If you agree my opinion, I will change the default value as 300. >> > >> > >> > >> > Best regards, >> > >> > Minho Kim >> > >> > >> > >> >> >> >> -- >> Best Regards, Edward J. Yoon >> --=20 Best Regards, Edward J. Yoon