hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Minho Kim <mi...@apache.org>
Subject Re: Need to increase the default number of connections to zookeeper
Date Tue, 07 Jul 2015 12:17:00 GMT
Oops,
I made a mistake. Edward is right. Each node has 192G RAM.

Thanks,
Minho Kim

2015-07-07 19:50 GMT+09:00 Edward J. Yoon <edwardyoon@apache.org>:

> > - 8 GB RAM
>
> I guess it looks like a typo Minho. :-) AFAIK, each node has 192GB memory.
>
> +1 we need to increase the default maxClientCnxns since modern
> machines have enough RAM.
>
> On Tue, Jul 7, 2015 at 7:13 PM, 김민호 <minwise.kim@samsung.com> wrote:
> > Hi all,
> >
> >
> >
> > Recently, I set up Hama cluster using 2 machines.
> >
> > This specification is as follows:
> >
> > - 8 GB RAM
> >
> > - 12 TB HDD
> >
> > - (I don’t remember CPU spec.)
> >
> >
> >
> > In order to run hama job, I set up configuration, bsp.tasks.maximum=40
> and
> > bsp.child.java.opts=-Xmx4096m, in hama-site.xml. (skip rests of
> settings.)
> >
> > So I performed examples which are pi Estimator and FastGraphGen but I got
> > below errors.
> >
> >
> >
> > attempt_201507071627_0001_000023_0:
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for
> > /bsp/job_201507071627_0001/peers/cluster-0:61029
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.hama.bsp.sync.ZKSyncClient.isExists(ZKSyncClient.java:108)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:261)
> >
> > attempt_201507071627_0001_000023_0:      at
> >
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.registerTask(ZooKeeperSyncC
> > lientImpl.java:279)
> >
> > attempt_201507071627_0001_000023_0:      at
> >
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.register(ZooKeeperSyncClien
> > tImpl.java:261)
> >
> > attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.BSPPeerImpl.
> > initializeSyncService(BSPPeerImpl.java:305)
> >
> > attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.BSPPeerImpl.
> > <init>(BSPPeerImpl.java:185)
> >
> > attempt_201507071627_0001_000023_0:     at
> > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
> >
> > attempt_201507071627_0001_000023_0: 15/07/07 16:27:40 ERROR
> > sync.ZKSyncClient: Error creating zk path
> > /bsp/job_201507071627_0001/peers/cluster-0:61029
> >
> > attempt_201507071627_0001_000023_0:
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for /bsp
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.hama.bsp.sync.ZKSyncClient.createZnode(ZKSyncClient.java:135)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:281)
> >
> > attempt_201507071627_0001_000023_0:      at
> >
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.registerTask(ZooKeeperSyncC
> > lientImpl.java:279)
> >
> > attempt_201507071627_0001_000023_0:      at
> >
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.register(ZooKeeperSyncClien
> > tImpl.java:261)
> >
> > attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.BSPPeerImpl.
> > initializeSyncService(BSPPeerImpl.java:305)
> >
> > attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.BSPPeerImpl.
> > <init>(BSPPeerImpl.java:185)
> >
> > attempt_201507071627_0001_000023_0:     at
> > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
> >
> > attempt_201507071627_0001_000023_0: 15/07/07 16:27:42 ERROR
> > sync.ZKSyncClient: Error checking zk path
> /bsp/job_201507071627_0001/sync/-1
> >
> > attempt_201507071627_0001_000023_0:
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for /bsp/job_201507071627_0001/sync/-1
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.hama.bsp.sync.ZKSyncClient.isExists(ZKSyncClient.java:108)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:261)
> >
> > attempt_201507071627_0001_000023_0:      at
> >
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncC
> > lientImpl.java:100)
> >
> > attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.BSPPeerImpl.
> > doFirstSync(BSPPeerImpl.java:312)
> >
> > attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.BSPPeerImpl.
> > <init>(BSPPeerImpl.java:238)
> >
> > attempt_201507071627_0001_000023_0:     at
> > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
> >
> > attempt_201507071627_0001_000023_0: 15/07/07 16:27:44 ERROR
> > sync.ZKSyncClient: Error creating zk path
> /bsp/job_201507071627_0001/sync/-1
> >
> > attempt_201507071627_0001_000023_0:
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for /bsp
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.hama.bsp.sync.ZKSyncClient.createZnode(ZKSyncClient.java:135)
> >
> > attempt_201507071627_0001_000023_0:      at
> > org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:281)
> >
> > attempt_201507071627_0001_000023_0:      at
> >
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncC
> > lientImpl.java:100)
> >
> > attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.BSPPeerImpl.
> > doFirstSync(BSPPeerImpl.java:312)
> >
> > attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.BSPPeerImpl.
> > <init>(BSPPeerImpl.java:238)
> >
> > attempt_201507071627_0001_000023_0:     at
> > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
> >
> > attempt_201507071627_0001_000023_0: 15/07/07 16:27:46 FATAL
> > bsp.GroomServer: SyncError from child
> >
> > attempt_201507071627_0001_000023_0:
> org.apache.hama.bsp.sync.SyncException
> >
> > attempt_201507071627_0001_000023_0:      at
> >
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncC
> > lientImpl.java:138)
> >
> > attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.BSPPeerImpl.
> > doFirstSync(BSPPeerImpl.java:312)
> >
> > attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.BSPPeerImpl.
> > <init>(BSPPeerImpl.java:238)
> >
> > attempt_201507071627_0001_000023_0:     at
> > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
> >
> > 15/07/07 16:27:48 INFO bsp.BSPJobClient: Job failed.
> >
> >
> >
> > This is a ZK error. Hama tasks try to get the /bsp node from zookeeper
> and
> > fails.
> >
> > This is just because hama.zookeeper.property.maxClientCnxns is 30 in
> hama-
> > default.xml.
> >
> > The problem has been encountered while the number of maximum tasks is
> > larger than it.
> >
> > To solve the problem, Hama has a setting to increase the number of
> > connectiosns to ZK.
> >
> >
> >
> > <property>
> >
> >     <name>hama.zookeeper.property.maxClientCnxns</name>
> >
> >     <value>100</value>
> >
> > </property>
> >
> >
> >
> > So we should update the default number of connections which is over 100
> > because server’s performance has been more improved than before.
> >
> > If you agree my opinion, I will change the default value as 300.
> >
> >
> >
> > Best regards,
> >
> > Minho Kim
> >
> >
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message