hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Fwd: connecton loss exception
Date Wed, 16 Feb 2011 14:25:38 GMT
---------- Forwarded message ----------
From: Edward J. Yoon <edwardyoon@apache.org>
Date: Wed, Feb 16, 2011 at 11:25 PM
Subject: Re: connecton loss exception
To: "hama-user@incubator.apache.org" <hama-user@incubator.apache.org>


I decided to add a "random communication benchmark" tool. In this week
(or next week), I'll share with you my benchmarking experience. I have
20 (160 cores) servers.

Thanks.

2011/2/16 Edward J. Yoon <edward@udanax.org>:
> Looks like problem of sync. Can you try again it after add Thread.sleep(100); line?
>
> Sent from my iPhone
>
> On 2011. 2. 16., at 오후 3:24, Paweł Brach <braszek@gmail.com> wrote:
>
>> Yes, I have of course. My cluster has been configured and both examples
>> PiEstimator and SerializePrinting work (there is communication between 3
>> nodes). I've modified your example  - PiEstimator (put everything in the
>> loop) and it works for few iterations (there is communication) and after
>> that connection is lost. After that connection is re-established but some
>> messages are missing. It looks like that Hama framework is very unstable
>> when it's loaded and many messages are sending between nodes.
>> On the same cluster I've configured Apache Hadoop and it's very stable.
>> If you have own cluster configured, could you run my example on it ? Have
>> you ever run something more complicated than PiEstimator and
>> SerializePrinting on it ?
>>
>> Cheers,
>> Pawel
>>
>> 2011/2/16 Chia-Hung Lin <clin4j@googlemail.com>
>>
>>> Have you configured zookeeper in hama-site.xml? Hama makes use of
>>> zookeeper to do node communication IIRC.
>>>
>>>   Opening socket connection to server cl5/127.0.1.1:2181
>>>
>>> indicates that seems only localhost is up.  If this is the case, you
>>> can change hama.zookeeper.quorum property pointing with value set to
>>> e.g.
>>>
>>> <property>
>>>   <name>hama.zookeeper.quorum</name>
>>>   <value>node1,node2,node3,node4,node5</value>
>>> </property>
>>>
>>> Hope it helps
>>>
>>> 2011/2/15 Paweł Brach <braszek@gmail.com>:
>>>> Hello,
>>>>
>>>> During last few days I've tested Hama solutions and today I found some
>>>> strange error in Hama framework. If you run a simple job with more than
>>> few
>>>> supersteps the following error occures:
>>>>
>>>> 2011-02-15 15:13:55,934 ERROR org.apache.hama.bsp.BSPPeer:
>>>> 2011-02-15 15:13:56,525 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket
>>>> connection to server cl5/127.0.1.1:2181
>>>> 2011-02-15 15:13:56,526 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
>>>> for server null, unexpected error, closing socket connection and
>>> attempting
>>>> reconnect
>>>> java.net.ConnectException: Connection refused
>>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>       at
>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>>>>       at
>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
>>>> 2011-02-15 15:13:56,626 ERROR org.apache.hama.bsp.BSPPeer:
>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>> KeeperErrorCode = ConnectionLoss for /bsp
>>>>
>>>> You can reproduce that by running PiEstimator (the newest source code
>>> from
>>>> svn) with small changes - put whole body of the bsp() method in the for
>>>> loop. So add in the beginning following line:
>>>>
>>>> for (int j = 0; j < 100; j++) {
>>>> // oryginal bsp() code
>>>> }
>>>>
>>>> When I'm trying to run it, the framowork hangs and mentioned before error
>>>> occures.
>>>>
>>>> Your help will be appreciated.
>>>>
>>>> Cheers,
>>>>
>>>> --
>>>> Pawel Brach
>>>>
>>>
>>>
>>>
>>> --
>>> ChiaHung Lin @ nuk, tw.
>>>
>>
>>
>>
>> --
>> Paweł Brach
>



--
Best Regards, Edward J. Yoon
http://blog.udanax.org
http://twitter.com/eddieyoon



-- 
Best Regards, Edward J. Yoon
http://blog.udanax.org
http://twitter.com/eddieyoon

Mime
View raw message