hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Need to increase the default number of connections to zookeeper
Date Tue, 07 Jul 2015 10:50:43 GMT
> - 8 GB RAM

I guess it looks like a typo Minho. :-) AFAIK, each node has 192GB memory.

+1 we need to increase the default maxClientCnxns since modern
machines have enough RAM.

On Tue, Jul 7, 2015 at 7:13 PM, 김민호 <minwise.kim@samsung.com> wrote:
> Hi all,
>
>
>
> Recently, I set up Hama cluster using 2 machines.
>
> This specification is as follows:
>
> - 8 GB RAM
>
> - 12 TB HDD
>
> - (I don’t remember CPU spec.)
>
>
>
> In order to run hama job, I set up configuration, bsp.tasks.maximum=40 and
> bsp.child.java.opts=-Xmx4096m, in hama-site.xml. (skip rests of settings.)
>
> So I performed examples which are pi Estimator and FastGraphGen but I got
> below errors.
>
>
>
> attempt_201507071627_0001_000023_0:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for
> /bsp/job_201507071627_0001/peers/cluster-0:61029
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZKSyncClient.isExists(ZKSyncClient.java:108)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:261)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.registerTask(ZooKeeperSyncC
> lientImpl.java:279)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.register(ZooKeeperSyncClien
> tImpl.java:261)
>
> attempt_201507071627_0001_000023_0:      at org.apache.hama.bsp.BSPPeerImpl.
> initializeSyncService(BSPPeerImpl.java:305)
>
> attempt_201507071627_0001_000023_0:      at org.apache.hama.bsp.BSPPeerImpl.
> <init>(BSPPeerImpl.java:185)
>
> attempt_201507071627_0001_000023_0:     at
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
>
> attempt_201507071627_0001_000023_0: 15/07/07 16:27:40 ERROR
> sync.ZKSyncClient: Error creating zk path
> /bsp/job_201507071627_0001/peers/cluster-0:61029
>
> attempt_201507071627_0001_000023_0:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /bsp
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZKSyncClient.createZnode(ZKSyncClient.java:135)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:281)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.registerTask(ZooKeeperSyncC
> lientImpl.java:279)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.register(ZooKeeperSyncClien
> tImpl.java:261)
>
> attempt_201507071627_0001_000023_0:      at org.apache.hama.bsp.BSPPeerImpl.
> initializeSyncService(BSPPeerImpl.java:305)
>
> attempt_201507071627_0001_000023_0:      at org.apache.hama.bsp.BSPPeerImpl.
> <init>(BSPPeerImpl.java:185)
>
> attempt_201507071627_0001_000023_0:     at
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
>
> attempt_201507071627_0001_000023_0: 15/07/07 16:27:42 ERROR
> sync.ZKSyncClient: Error checking zk path /bsp/job_201507071627_0001/sync/-1
>
> attempt_201507071627_0001_000023_0:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /bsp/job_201507071627_0001/sync/-1
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZKSyncClient.isExists(ZKSyncClient.java:108)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:261)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncC
> lientImpl.java:100)
>
> attempt_201507071627_0001_000023_0:      at org.apache.hama.bsp.BSPPeerImpl.
> doFirstSync(BSPPeerImpl.java:312)
>
> attempt_201507071627_0001_000023_0:      at org.apache.hama.bsp.BSPPeerImpl.
> <init>(BSPPeerImpl.java:238)
>
> attempt_201507071627_0001_000023_0:     at
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
>
> attempt_201507071627_0001_000023_0: 15/07/07 16:27:44 ERROR
> sync.ZKSyncClient: Error creating zk path /bsp/job_201507071627_0001/sync/-1
>
> attempt_201507071627_0001_000023_0:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /bsp
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZKSyncClient.createZnode(ZKSyncClient.java:135)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:281)
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncC
> lientImpl.java:100)
>
> attempt_201507071627_0001_000023_0:      at org.apache.hama.bsp.BSPPeerImpl.
> doFirstSync(BSPPeerImpl.java:312)
>
> attempt_201507071627_0001_000023_0:      at org.apache.hama.bsp.BSPPeerImpl.
> <init>(BSPPeerImpl.java:238)
>
> attempt_201507071627_0001_000023_0:     at
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
>
> attempt_201507071627_0001_000023_0: 15/07/07 16:27:46 FATAL
> bsp.GroomServer: SyncError from child
>
> attempt_201507071627_0001_000023_0: org.apache.hama.bsp.sync.SyncException
>
> attempt_201507071627_0001_000023_0:      at
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncC
> lientImpl.java:138)
>
> attempt_201507071627_0001_000023_0:      at org.apache.hama.bsp.BSPPeerImpl.
> doFirstSync(BSPPeerImpl.java:312)
>
> attempt_201507071627_0001_000023_0:      at org.apache.hama.bsp.BSPPeerImpl.
> <init>(BSPPeerImpl.java:238)
>
> attempt_201507071627_0001_000023_0:     at
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1251)
>
> 15/07/07 16:27:48 INFO bsp.BSPJobClient: Job failed.
>
>
>
> This is a ZK error. Hama tasks try to get the /bsp node from zookeeper and
> fails.
>
> This is just because hama.zookeeper.property.maxClientCnxns is 30 in hama-
> default.xml.
>
> The problem has been encountered while the number of maximum tasks is
> larger than it.
>
> To solve the problem, Hama has a setting to increase the number of
> connectiosns to ZK.
>
>
>
> <property>
>
>     <name>hama.zookeeper.property.maxClientCnxns</name>
>
>     <value>100</value>
>
> </property>
>
>
>
> So we should update the default number of connections which is over 100
> because server’s performance has been more improved than before.
>
> If you agree my opinion, I will change the default value as 300.
>
>
>
> Best regards,
>
> Minho Kim
>
>
>



-- 
Best Regards, Edward J. Yoon

Mime
View raw message