hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: A question about "Connection timed out" and "operation timeout"
Date Thu, 20 Aug 2009 16:35:23 GMT
Hi Qian, it would good if you could create a jira for this:
https://issues.apache.org/jira/browse/ZOOKEEPER

include both the client logs and the server logs (for overlapping 
client/server time period where you see the problem). also the server 
config if you're using a quorum vs standalone. If you could also include 
some/all of the client side code you have that would be useful for us to 
review.

are you doing anything in particular during this time period where you 
see a problem? perhaps load testing of the server?

2 sec timeout is pretty low. the client needs to send a heartbeat every 
0.6 seconds (1/3 timeout), and if it doesn't hear back from the server 
in another 0.6 seconds it will close the connection and attempt to 
re-connect to a(nother) server. also if the server doesn't hear from the 
client every 2 seconds it will expire the session. in general this is 
ok, but if you have any network issues, or periodic heavy load on the 
server this could cause a problem. typically we suggest timeouts in the 
20-30 second range, but we do have some ppl that run in 5-10 second 
range regularly (but they have solid lans with low latency connections)

re the ephemeral node - did you try the suggestions I had in my previous 
email? (attached) (please include that on the jira as well)

Patrick

Qian Ye wrote:
> Hi Ben:
> 
> I used multi-thread library, and the session timeout is set to 2000 when the
> zookeeper handler was initialized.
> 
> On Thu, Aug 20, 2009 at 9:52 PM, Benjamin Reed <breed@yahoo-inc.com> wrote:
> 
>> are you using the single threaded or multithreaded C library? the exceeded
>> deadline message means that our thread was supposed to get control after a
>> certain period, but we got control that many milliseconds late. what is your
>> session timeout?
>>
>> ben
>>
>> ________________________________________
>> From: Qian Ye [yeqian.zju@gmail.com]
>> Sent: Thursday, August 20, 2009 3:17 AM
>> To: zookeeper-user
>> Subject: A question about "Connection timed out" and "operation timeout"
>>
>> Hi guys:
>>
>> I met the problem again: an ephemeral node disappeared, and I found it
>> because my application got a "operation timeout"
>>
>> My application which created an ephemeral node at the zookeeper server,
>> printed the following log
>>
>> *WARNING: 08-20 03:09:20:  auto * 182894118176
>> [logid:][reqip:][auto_exchanger_zk_basic.cpp:605]get children
>> fail.[/forum/elect_nodes][-7][operation timeout]*
>>
>> and the Zookeeper client printed the following log (the log level is INFO)
>>
>> 2009-08-19 21:36:18,067:3813(0x9556c520):ZOO_INFO@log_env@545: Client
>> environment:zookeeper.version=zookeeper C client 3.2.0
>>    606 2009-08-19 21:36:18,067:3813(0x9556c520):ZOO_INFO@log_env@549:
>> Client environment:host.name=jx-ziyuan-test00.jx.baidu.com
>>    607 2009-08-19 21:36:18,068:3813(0x9556c520):ZOO_INFO@log_env@557:
>> Client environments.name=Linux
>>    608 2009-08-19 21:36:18,068:3813(0x9556c520):ZOO_INFO@log_env@558:
>> Client environments.arch=2.6.9-52bs
>>    609 2009-08-19 21:36:18,068:3813(0x9556c520):ZOO_INFO@log_env@559:
>> Client environments.version=#2 SMP Fri Jan 26 13:34:38 CST 2007
>>    610 2009-08-19 21:36:18,068:3813(0x9556c520):ZOO_INFO@log_env@567:
>> Client environment:user.name=club
>>    611 2009-08-19 21:36:18,068:3813(0x9556c520):ZOO_INFO@log_env@577:
>> Client environment:user.home=/home/club
>>    612 2009-08-19 21:36:18,068:3813(0x9556c520):ZOO_INFO@log_env@589:
>> Client environment:user.dir=/home/club/user/luhongbo/auto-exchanger
>>    613 2009-08-19 21:36:18,068:3813(0x9556c520):ZOO_INFO@zookeeper_init
>> @613:
>> Initiating client connection,
>> host=127.0.0.1:2181,127.0.0.1:2182sessionTimeout=2000 wa
>> tcher=0x408c56 sessionId=0x0
>> sessionPasswd=<null> context=(nil) flags=0
>>    614 2009-08-19 21:36:18,069:3813(0x41401960):ZOO_INFO@check_events
>> @1439:
>> initiated connection to server [127.0.0.1:2181]
>>    615 2009-08-19 21:36:18,070:3813(0x41401960):ZOO_INFO@check_events
>> @1484:
>> connected to server [127.0.0.1:2181] with session id=1232c1688a20093
>>    616 2009-08-20
>> 02:48:01,780:3813(0x41401960):ZOO_WARN@zookeeper_interest@1335:
>> Exceeded deadline by 520ms
>>    617 2009-08-20
>> 03:08:52,332:3813(0x41401960):ZOO_WARN@zookeeper_interest@1335:
>> Exceeded deadline by 14ms
>>    618 2009-08-20
>> 03:09:04,666:3813(0x41401960):ZOO_WARN@zookeeper_interest@1335:
>> Exceeded deadline by 48ms
>>    619 2009-08-20
>> 03:09:09,733:3813(0x41401960):ZOO_WARN@zookeeper_interest@1335:
>> Exceeded deadline by 24ms
>>    620 *2009-08-20
>> 03:09:20,289:3813(0x41401960):ZOO_WARN@zookeeper_interest@1335: Exceeded
>> deadline by 264ms*
>>    621 *2009-08-20
>> 03:09:20,295:3813(0x41401960):ZOO_ERROR@handle_socket_error_msg@1388:
>> Socket
>> [127.0.0.1:2181] zk retcode=-7, errno=110(Connection timed out): conn
>> ection timed out (exceeded timeout by 264ms)*
>>    622 *2009-08-20
>> 03:09:20,309:3813(0x41401960):ZOO_WARN@zookeeper_interest@1335: Exceeded
>> deadline by 284ms*
>>    623 *2009-08-20
>> 03:09:20,309:3813(0x41401960):ZOO_ERROR@handle_socket_error_msg@1433:
>> Socket
>> [127.0.0.1:2182] zk retcode=-4, errno=111(Connection refused):
>> server         refused to accept the client*
>>    624 *2009-08-20 03:09:20,353:3813(0x41401960):ZOO_INFO@check_events
>> @1439:
>> initiated connection to server [127.0.0.1:2181]*
>>    625 *2009-08-20 03:09:20,552:3813(0x41401960):ZOO_INFO@check_events
>> @1484:
>> connected to server [127.0.0.1:2181] with session id=1232c1688a20093*
>>
>> I don't know why the "connection timed out" happened at *2009-08-20
>> 03:09:20,295:3813, *and the server refuse to accept the client. Could some
>> one give me any hints? And I'm not sure the meaning of "Exceeded deadline
>> by
>> xxms", need some help too.
>>
>>
>> P.S. I used the Zookeeper 3.2.0 (Server and C Client API) and run a
>> stand-alone instance
>>
>> Thx all~
>>
>> --
>> With Regards!
>>
>> Ye, Qian
>> Made in Zhejiang University
>>
> 
> 
> 

Mime
View raw message