zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chang Song <tru64...@me.com>
Subject Re: Serious problem processing hearbeat on login stampede
Date Thu, 14 Apr 2011 14:03:03 GMT



2011. 4. 14., 오전 10:30, Patrick Hunt 작성:

> 2011/4/13 Chang Song <tru64ufs@me.com>:
>> 
>> Patrick.
>> Thank you for the reply.
>> 
>> We are very aware of all the things you mentioned below.
>> None of those.
>> 
>> Not GC (we monitor every possible resource in JVM and system)
>> No IO. No Swapping.
>> No VM guest OS. No logging.
>> 
> 
> Hm. ok, a few more ideas then:
> 
> 1) what is the connectivity like btw the servers?
> 
> What is the ping time btw them?
> 
> Is the system perhaps loading down the network during this test,
> causing network latency to increase? Are all the nic cards (server and
> client) configured correctly? I've seen a number of cases where
> clients and/or server had incorrectly configured nics (ethtool
> reported 10 MB/sec half duplex for what should be 1gigeth)


Nope. We are experts at these ;)
No issue regarding these



> 2) regarding IO, if you run 'iostat -x 2' on the zk servers while your
> issue is happening, what's the %util of the disk? what's the iowait
> look like?
> 

Again, no I/O at all.   0% 



> 3) create a JIRA and upload your 3 server configuration files. Include
> the log4j.properties file you are using and any other details you
> think might be useful. If you can upload a log file from when you see
> this issue that would be useful. Upload any log file if you can't get
> it from the time when you see the issue.
> 

I will have my team to file a JIRA.



>> 
>> Oh, one thing I should mention is that it is not 1000 clients,
>> 1000 login/logout per second. All operations like closeSession,
>> ping takes more than 8 seconds (peak).
>> 
> 
> Are you continuously logging in and the logging out, 1000 times per
> second? That's not a good use case for ZK sessions in general. Perhaps
> if you describe your use case in more detail it would help.
> 

Patrick. They are not continuously login/logout.
Maybe a couple of times a week. and before they push new feature.
When this happens, clients in group A drops out of clusters, which causes
problem to other unrelated services.


It is not about use case, because ZK clients simply tried to connect to
ZK ensemble. No use case applies. Just many clients login at the
same time or expires at the same time or close session at the same time.


I am talking about important of realtime-ness of heartbeat.
Especially when session timeouts are short like 15 sec.

Heartbeats should be handled in an isolated queue and a 
dedicated thread.  I don't think we need strict ordering keeping
of heartbeats, do we?

Thank you for your help, Patrick.



> Patrick
> 
>> It's about CommitProcessor thread queueing (in leader).
>> QueuedRequests goes up to 800, so does commitedRequests and
>> PendingRequestElapsedTime. PendingRequestElapsedTime
>> goes up to 8.8 seconds during this flood.
>> 
>> To exactly reproduce this scenario, easiest way is to
>> 
>> - suspend All JVM client with debugger
>> - Cause all client JVM OOME to create heap dump
>> 
>> in group B. All clients in group A will not be able to receive
>> ping response in 5 seconds.
>> 
>> We need to fix this as soon as possible.
>> What we do as a workaround is to raise sessionTimeout to 40 sec.
>> At least clients in Group A survives. But this increases
>> our cluster failover time significantly.
>> 
>> Thank you, Patrick.
>> 
>> 
>> ps. We actually push ping request to FinalRequestProcessor as soon
>>     as the packet identifies itself as ping. No dice.
>> 
>> 
>> 
>> 2011. 4. 14., 오전 12:21, Patrick Hunt 작성:
>> 
>>> Hi Chang, it sounds like you may have an issue with your cluster
>>> environment/setup, or perhaps a resource (GC/mem) issue. Have you
>>> looked through the troubleshooting guide?
>>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting
>>> 
>>> In particular 1000 clients connecting should be fine, I've personally
>>> seen clusters of 7-10 thousand clients. Keep in mind that each session
>>> establishment is essentially a write (so the quorum in involved) and
>>> what we typically see there is that the cluster configuration has
>>> issues. 14 seconds for a ping response is huge and indicates one of
>>> the following may be an underlying cause:
>>> 
>>> 1) are you running in a virtualized environment?
>>> 2) are you co-locating other services on the same host(s) that make up
>>> the ZK serving cluster?
>>> 3) have you followed the admin guide's "things to avoid"?
>>> http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_commonProblems
>>> In particular ensuring that you are not swapping or going into gc
>>> pause (both on the server and the client)
>>> a) try turning on GC logging and ensure that you are not going into GC
>>> pause, see the troubleshooting guide, this is the most common cause of
>>> high latency for the clients
>>> b) ensure that you are not swapping
>>> c) ensure that other processes are not causing log writing
>>> (transactional logging) to be slow.
>>> 
>>> Patrick
>>> 
>>> On Wed, Apr 13, 2011 at 6:35 AM, Chang Song <tru64ufs@me.com> wrote:
>>>> Hello, folks.
>>>> 
>>>> We have ran into a very serious issue with Zookeeper.
>>>> Here's a brief scenario.
>>>> 
>>>> We have some Zookeeper clients with session timeout of 15 sec (thus 5 sec
ping), let's called
>>>> these clients, group A.
>>>> 
>>>> Now 1000 new clients (let's call these, group B) starts up at the same time
trying to
>>>> connect to a three-node ZK ensemble, creating ZK createSession stampede.
>>>> 
>>>> Now almost all clients in group A is not able to exchange ping within session
expire time (15 sec).
>>>> Thus clients in group A drops out of the cluster.
>>>> 
>>>> We have looked into this issue a bit, found mostly synchronous nature of
session queue processing.
>>>> Latency between ping request and response ranges from 10ms up to 14 seconds
during this login stampede.
>>>> 
>>>> Since session timeout is serious matter for our cluster, thus ping should
be done in psuedo realtime fashion.
>>>> 
>>>> I don't know exactly how these ping timeout policy in clients and server,
but failure to receive ping
>>>> response in clients due to zookeeper login session seem very nonsense to
me.
>>>> 
>>>> Shouldn't we have a separate ping/heartbeat queue and thread?
>>>> Or even multiple ping queues/threads to keep realtime heartbeat?
>>>> 
>>>> THis is very serious issue with Zookeeper for our mission-critical system.
Could anyone
>>>> look into this?
>>>> 
>>>> I will try to file a bug.
>>>> 
>>>> Thank you.
>>>> 
>>>> Chang
>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message