hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chia-Hung Lin <cli...@googlemail.com>
Subject Re: Zookeeper - Problems?
Date Mon, 17 Jun 2013 14:01:51 GMT
Network traffic loading may depend on bandwidth, packet size,
frequency of communication, etc, even though they are reserved
instances. For example, in a scenario where only 2 servers are running
in a network; and server A floods messages (large package size or
higher frequency sending the messages) to its peer server B that may
lead to the B server unresponsive or unable to respond in time.



On 17 June 2013 18:24, Edward J. Yoon <edwardyoon@apache.org> wrote:
> Please see the zookeeper logs to figure out the reason of
> ConnectionLossException. There are many possibilities such as FullGC,
> heavy swap space usage, or session expired.
>
> I guess, the answer will be in the sentence "stopped working after
> 4600 supersteps".
>
> On Mon, Jun 17, 2013 at 6:11 PM, Sascha Jonas
> <sascha.jonas@student.htw-berlin.de> wrote:
>> The servers are reserved for Apache Hama, so there is no other network
>> traffic. I tested it on three other PCs at another location but with the
>> same configuration and got the same errors :(
>>
>> Am So, 16.06.2013, 16:44 schrieb Chia-Hung Lin:
>>> Have you checked if underlying network traffic is busy when error happens?
>>>
>>> Can't be very sure but the symptom seems to be the heavy network
>>> traffic leads to the zk connection lost.
>>>
>>>
>>>
>>> On 16 June 2013 20:22, Sascha Jonas <sascha.jonas@student.htw-berlin.de>
>>> wrote:
>>>> Hey,
>>>>
>>>> iam using Apache Hama on a small cluster with two computers. Its working
>>>> fine with a small number of supersteps but every time i am trying with
>>>> lots of iterations e.g. 10000 it crashes.
>>>>
>>>> Right now it stopped working after 4600 supersteps. 8 from 16 Tasks are
>>>> still running while the log shows some errors.
>>>>
>>>> Iam using Apache Hama 0.6 and the builtin Zookeeper. Should i go with a
>>>> newer Hama or Zookeeper version?
>>>>
>>>> 13/06/16 00:14:14 ERROR sync.ZKSyncClient: Error creating zk path
>>>> /bsp/job_201306091733_0009/sync/4276
>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>> KeeperErrorCode = ConnectionLoss for
>>>> /bsp/job_201306091733_0009/sync/4276
>>>>         at
>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>>>         at
>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>>>         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>>>>         at
>>>> org.apache.hama.bsp.sync.ZKSyncClient.createZnode(ZKSyncClient.java:138)
>>>>         at
>>>> org.apache.hama.bsp.sync.ZKSyncClient.writeNode(ZKSyncClient.java:290)
>>>>         at
>>>> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncClientImpl.java:99)
>>>>         at
>>>> org.apache.hama.bsp.BSPPeerImpl.enterBarrier(BSPPeerImpl.java:474)
>>>>         at org.apache.hama.bsp.BSPPeerImpl.sync(BSPPeerImpl.java:428)
>>>>         at
>>>> de.distMLP.Base_MLP_Trainer.calculateAndWriteCost(Base_MLP_Trainer.java:90)
>>>>         at
>>>> de.distMLP.Train_MultilayerPerceptron$MultilayerPerceptron_Trainer.bsp(Train_MultilayerPerceptron.java:57)
>>>>         at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:168)
>>>>         at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>>>>         at
>>>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1262)
>>>> 13/06/16 00:14:15 ERROR
>>>> distMLP.Train_MultilayerPerceptron$MultilayerPerceptron_Trainer:
>>>> org.apache.hama.bsp.sync.SyncException
>>>> org.apache.hama.bsp.sync.SyncException
>>>>         at
>>>> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.enterBarrier(ZooKeeperSyncClientImpl.java:137)
>>>>         at
>>>> org.apache.hama.bsp.BSPPeerImpl.enterBarrier(BSPPeerImpl.java:474)
>>>>         at org.apache.hama.bsp.BSPPeerImpl.sync(BSPPeerImpl.java:428)
>>>>         at
>>>> de.distMLP.Base_MLP_Trainer.calculateAndWriteCost(Base_MLP_Trainer.java:90)
>>>>         at
>>>> de.distMLP.Train_MultilayerPerceptron$MultilayerPerceptron_Trainer.bsp(Train_MultilayerPerceptron.java:57)
>>>>         at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:168)
>>>>         at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>>>>         at
>>>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1262)
>>>>
>>>
>>
>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon

Mime
View raw message