zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Serious problem processing hearbeat on login stampede
Date Thu, 14 Apr 2011 04:53:35 GMT
two additional thoughts come to mind:

1) try running the ensemble with a single zk server, does this help at
all? (it might provide a short term workaround, it also might provide
some insight into what's causing the issue)

2) can you hold off some of the clients from the stampede? Perhaps add
a random holdoff to each of the clients before connecting,
additionally a similar random holdoff from closing the session. this
seems like a straightforward change from your client side (easy to
implement/try) but hard to tell given we don't have much insight into
what your use case is.


Anyone else in the community have any ideas?


Patrick

2011/4/13 Patrick Hunt <phunt@apache.org>:
> 2011/4/13 Chang Song <tru64ufs@me.com>:
>>
>> Patrick.
>> Thank you for the reply.
>>
>> We are very aware of all the things you mentioned below.
>> None of those.
>>
>> Not GC (we monitor every possible resource in JVM and system)
>> No IO. No Swapping.
>> No VM guest OS. No logging.
>>
>
> Hm. ok, a few more ideas then:
>
> 1) what is the connectivity like btw the servers?
>
> What is the ping time btw them?
>
> Is the system perhaps loading down the network during this test,
> causing network latency to increase? Are all the nic cards (server and
> client) configured correctly? I've seen a number of cases where
> clients and/or server had incorrectly configured nics (ethtool
> reported 10 MB/sec half duplex for what should be 1gigeth)
>
> 2) regarding IO, if you run 'iostat -x 2' on the zk servers while your
> issue is happening, what's the %util of the disk? what's the iowait
> look like?
>
> 3) create a JIRA and upload your 3 server configuration files. Include
> the log4j.properties file you are using and any other details you
> think might be useful. If you can upload a log file from when you see
> this issue that would be useful. Upload any log file if you can't get
> it from the time when you see the issue.
>
>>
>> Oh, one thing I should mention is that it is not 1000 clients,
>> 1000 login/logout per second. All operations like closeSession,
>> ping takes more than 8 seconds (peak).
>>
>
> Are you continuously logging in and the logging out, 1000 times per
> second? That's not a good use case for ZK sessions in general. Perhaps
> if you describe your use case in more detail it would help.
>
> Patrick
>
>> It's about CommitProcessor thread queueing (in leader).
>> QueuedRequests goes up to 800, so does commitedRequests and
>> PendingRequestElapsedTime. PendingRequestElapsedTime
>> goes up to 8.8 seconds during this flood.
>>
>> To exactly reproduce this scenario, easiest way is to
>>
>> - suspend All JVM client with debugger
>> - Cause all client JVM OOME to create heap dump
>>
>> in group B. All clients in group A will not be able to receive
>> ping response in 5 seconds.
>>
>> We need to fix this as soon as possible.
>> What we do as a workaround is to raise sessionTimeout to 40 sec.
>> At least clients in Group A survives. But this increases
>> our cluster failover time significantly.
>>
>> Thank you, Patrick.
>>
>>
>> ps. We actually push ping request to FinalRequestProcessor as soon
>>      as the packet identifies itself as ping. No dice.
>>
>>
>>
>> 2011. 4. 14., 오전 12:21, Patrick Hunt 작성:
>>
>>> Hi Chang, it sounds like you may have an issue with your cluster
>>> environment/setup, or perhaps a resource (GC/mem) issue. Have you
>>> looked through the troubleshooting guide?
>>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting
>>>
>>> In particular 1000 clients connecting should be fine, I've personally
>>> seen clusters of 7-10 thousand clients. Keep in mind that each session
>>> establishment is essentially a write (so the quorum in involved) and
>>> what we typically see there is that the cluster configuration has
>>> issues. 14 seconds for a ping response is huge and indicates one of
>>> the following may be an underlying cause:
>>>
>>> 1) are you running in a virtualized environment?
>>> 2) are you co-locating other services on the same host(s) that make up
>>> the ZK serving cluster?
>>> 3) have you followed the admin guide's "things to avoid"?
>>> http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_commonProblems
>>> In particular ensuring that you are not swapping or going into gc
>>> pause (both on the server and the client)
>>> a) try turning on GC logging and ensure that you are not going into GC
>>> pause, see the troubleshooting guide, this is the most common cause of
>>> high latency for the clients
>>> b) ensure that you are not swapping
>>> c) ensure that other processes are not causing log writing
>>> (transactional logging) to be slow.
>>>
>>> Patrick
>>>
>>> On Wed, Apr 13, 2011 at 6:35 AM, Chang Song <tru64ufs@me.com> wrote:
>>>> Hello, folks.
>>>>
>>>> We have ran into a very serious issue with Zookeeper.
>>>> Here's a brief scenario.
>>>>
>>>> We have some Zookeeper clients with session timeout of 15 sec (thus 5 sec
ping), let's called
>>>> these clients, group A.
>>>>
>>>> Now 1000 new clients (let's call these, group B) starts up at the same time
trying to
>>>> connect to a three-node ZK ensemble, creating ZK createSession stampede.
>>>>
>>>> Now almost all clients in group A is not able to exchange ping within session
expire time (15 sec).
>>>> Thus clients in group A drops out of the cluster.
>>>>
>>>> We have looked into this issue a bit, found mostly synchronous nature of
session queue processing.
>>>> Latency between ping request and response ranges from 10ms up to 14 seconds
during this login stampede.
>>>>
>>>> Since session timeout is serious matter for our cluster, thus ping should
be done in psuedo realtime fashion.
>>>>
>>>> I don't know exactly how these ping timeout policy in clients and server,
but failure to receive ping
>>>> response in clients due to zookeeper login session seem very nonsense to
me.
>>>>
>>>> Shouldn't we have a separate ping/heartbeat queue and thread?
>>>> Or even multiple ping queues/threads to keep realtime heartbeat?
>>>>
>>>> THis is very serious issue with Zookeeper for our mission-critical system.
Could anyone
>>>> look into this?
>>>>
>>>> I will try to file a bug.
>>>>
>>>> Thank you.
>>>>
>>>> Chang
>>>>
>>>>
>>>>
>>
>>
>

Mime
View raw message