zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Unexpected delay between pings sent from the client to server
Date Mon, 08 Apr 2019 04:01:06 GMT
Try turning on verbose GC logging and look if it's GC. In my experience
that's often the cause. "client application suddenly loses
its Zookeeper session" - might be client GC, or it could be server GC. See
if either of those shed light.

Patrick

On Sat, Apr 6, 2019 at 1:10 PM Norbert Kalmar <nkalmar@cloudera.com.invalid>
wrote:

> Hi Gelbana,
>
> max_latency tells you the time elapsed between creating the request and
> FinalRequestProcessor processing it. So the cause for being that high could
> be basically anything.
> Turning on debug log for ZooKeeper server could help pinpoint at what point
> the request get stuck for so long.
>
> Regards,
> Norbert
>
> On Wed, Apr 3, 2019 at 3:07 PM Muhammad Gelbana <m.gelbana@gmail.com>
> wrote:
>
> > Another couple of things I found:
> >
> > *A couple of Zookeeper client threads are stuck at these stacktraces for
> > ~30 seconds*
> > "pool-2-thread-1-EventThread" #1218 daemon prio=5 os_prio=0
> > tid=0x00007ff3f5e23800 nid=0x5cd8 waiting on condition
> [0x00007ff3ef803000]
> >    java.lang.Thread.State: WAITING (parking)
> >     at sun.misc.Unsafe.park(Native Method)
> >     - parking to wait for  <0x000000018d6d8ed8> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> >     at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> >     at
> >
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> >     at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:504)
> >
> > "pool-2-thread-1-SendThread(72.55.136.25:2181)" #1217 daemon prio=5
> > os_prio=0 tid=0x00007ff3f5e23000 nid=0x5cd7 runnable [0x00007ff3ef904000]
> >    java.lang.Thread.State: RUNNABLE
> >     at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> >     at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> >     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
> >     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> >     - locked <0x000000018d68a730> (a sun.nio.ch.Util$3)
> >     - locked <0x000000018d68a720> (a
> java.util.Collections$UnmodifiableSet)
> >     - locked <0x000000018d68a258> (a sun.nio.ch.EPollSelectorImpl)
> >     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> >     at
> >
> >
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
> >     at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)
> >
> > *Running the mntr command returned the following stats*
> > zk_version    3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on
> > 06/29/2018 04:05 GMT
> > zk_avg_latency    0
> > zk_max_latency    *17657*
> > zk_min_latency    0
> > zk_packets_received    1427134
> > zk_packets_sent    1596974
> > zk_num_alive_connections    64
> > zk_outstanding_requests    0
> > zk_server_state    follower
> > zk_znode_count    1394
> > zk_watch_count    592
> > zk_ephemerals_count    192
> > zk_approximate_data_size    181257
> > zk_open_file_descriptor_count    94
> > zk_max_file_descriptor_count    1048576
> > zk_fsync_threshold_exceed_count    1
> >
> > I find the *zk_max_latency* extremely hight. I'm wondering what kind of
> > latency is that ? How can I debug the reason for this value ?
> >
> > Thanks,
> > Gelbana
> >
> >
> >
> > On Wed, Apr 3, 2019 at 1:42 PM Muhammad Gelbana <m.gelbana@gmail.com>
> > wrote:
> >
> > > I'm trying to debug a problem where our client application suddenly
> loses
> > > its Zookeeper session. I concluded that by looking at the Zookeeper
> > server
> > > logs.
> > >
> > > I increased the logging details for the client and found the following
> > log
> > > messages
> > >
> > >> DEBUG: [07:33:33] [demo | HA | Manager] Got ping response for
> sessionid:
> > >> 0x3000da76fa904b6 after 0ms
> > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse]
> > >> DEBUG: [07:34:07] [demo | HA | Manager] Got ping response for
> sessionid:
> > >> 0x3000da76fa904b6 after 0ms
> > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse]
> > >> DEBUG: [07:34:40] [demo | HA | Manager] Got ping response for
> sessionid:
> > >> 0x3000da76fa904b6 after 0ms
> > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse]
> > >> DEBUG: [07:35:13] [demo | HA | Manager] Got ping response for
> sessionid:
> > >> 0x3000da76fa904b6 after 0ms
> > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse]
> > >> DEBUG: [07:35:47] [demo | HA | Manager] Got ping response for
> sessionid:
> > >> 0x3000da76fa904b6 after 0ms
> > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse]
> > >> DEBUG: [07:36:20] [demo | HA | Manager] Got ping response for
> sessionid:
> > >> 0x3000da76fa904b6 after 0ms
> > >> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse]
> > >>
> > >
> > > I noticed that the duration between each log message is ~33 seconds
> while
> > > on another environment (my laptop), the duration goes down to ~1
> second.
> > > What could be causing this huge difference ? I doubt that whatever is
> > > causing this effect causes the delay to increase significantly at some
> > > point to the extend that makes my client lose its session.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message