zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jared Cantwell <jared.cantw...@gmail.com>
Subject Re: One ensemble node shows massive number of 'Outstanding' requests
Date Tue, 24 Mar 2015 14:49:35 GMT
I do not see any evidence of a time jump or date change on this node during
recently.  I will continue to investigate.

~Jared

On Mon, Mar 23, 2015 at 6:42 PM, Patrick Hunt <phunt@apache.org> wrote:

> Not this, right?
>
> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6900441
> http://osdir.com/ml/hotspot-runtime-dev-java/2013-09/msg00006.html
>
> https://bbossola.wordpress.com/2013/09/04/jvm-issue-concurrency-is-affected-by-changing-the-date-of-the-system/
>
> Patrick
>
>
> On Mon, Mar 23, 2015 at 5:00 PM, Jared Cantwell
> <jared.cantwell@gmail.com> wrote:
> > Greetings,
> >
> > We just saw this problem again, and this time we were able to capture a
> > core file of the jvm using gdb.  I've run it through jstack and jmap to
> get
> > a heap profile.  I can see that the FollowerZookeeperServer has
> > a requestsInProcess member that is ~24K.  I can also see that the
> > CommitProcessor's queuedRequest's list has the 24K items in it, so the
> > FinalRequestProcessor's processRequest function isn't ever getting called
> > to complete the requests.
> >
> > The CommitProcessor's run() is doing this:
> >
> > Thread 23510: (state = BLOCKED)
> >  - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be
> > imprecise)
> >  - org.apache.zookeeper.server.quorum.CommitProcessor.run() @bci=165,
> > line=182 (Compiled frame)
> >
> > Based on the state, it made it to wait() because
> isWaitingForCommit()==true
> > && committedRequests.isEmpty()==true.
> >
> > Strangely, once we detached from the jvm, it must have woken up this
> thread
> > and the queue flushed out as expected, bringing everything back to
> normal.
> >
> > I'll keep digging, but any help or direction would be appreciated as I'm
> > not very familiar with this area of the codebase.
> >
> > Thanks!
> > Jared
> >
> >
> > On Tue, Feb 17, 2015 at 2:38 PM, Flavio Junqueira <
> > fpjunqueira@yahoo.com.invalid> wrote:
> >
> >> It doesn't ring a bell, but it might be worth having a look at the logs
> to
> >> see if there is anything unusual.
> >>
> >> Just to clarify, was the number of outstanding requests growing,
> constant?
> >> I suppose the server was following/leading and operations were going
> >> through, otherwise it'd have dropped the connection to the leader or
> >> leadership.
> >>
> >> -Flavio
> >>
> >> > On 17 Feb 2015, at 18:01, Marshall McMullen <
> marshall.mcmullen@gmail.com>
> >> wrote:
> >> >
> >> > Greetings,
> >> >
> >> > We saw an issue recently that I've never seen before and am hoping I
> can
> >> > get some clarity on what may cause this and whether it's a known
> issue.
> >> We
> >> > had a 5 node ensemble and were unable to connect to one of the
> ZooKeeper
> >> > instances.  When trying to connect with zkCli it would timeout. When I
> >> > connected via telnet and issued the srvr four letter word, I was
> >> surprised
> >> > to see that this one server reported a massive number of 'Outstanding'
> >> > requests. I'd never seen that really be anything other than 0 before.
> On
> >> > the ZK dev guide it says:
> >> >
> >> > "outstanding is the number of queued requests, this increases when the
> >> > server is under load and is receiving more sustained requests than it
> can
> >> > process, ie the request queue". I looked at all the ZK servers in my
> >> > ensemble:
> >> >
> >> > for ip in 101 102 103 104 105; do echo srvr | nc 172.21.20.${ip} 2181
> |
> >> > grep Outstanding; done
> >> > Outstanding: 0
> >> > Outstanding: 0
> >> > Outstanding: 0
> >> > Outstanding: 0
> >> > Outstanding: 18876
> >> >
> >> > I eventually killed ZK on the affected server and everything corrected
> >> > itself and Outstanding went to zero and I was able to connect again.
> >> >
> >> > Is this something anyone's familiar with? I have logs if it would be
> >> > helpful.
> >> >
> >> > Thanks!
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message