zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: One ensemble node shows massive number of 'Outstanding' requests
Date Tue, 24 Mar 2015 00:42:09 GMT
Not this, right?

http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6900441
http://osdir.com/ml/hotspot-runtime-dev-java/2013-09/msg00006.html
https://bbossola.wordpress.com/2013/09/04/jvm-issue-concurrency-is-affected-by-changing-the-date-of-the-system/

Patrick


On Mon, Mar 23, 2015 at 5:00 PM, Jared Cantwell
<jared.cantwell@gmail.com> wrote:
> Greetings,
>
> We just saw this problem again, and this time we were able to capture a
> core file of the jvm using gdb.  I've run it through jstack and jmap to get
> a heap profile.  I can see that the FollowerZookeeperServer has
> a requestsInProcess member that is ~24K.  I can also see that the
> CommitProcessor's queuedRequest's list has the 24K items in it, so the
> FinalRequestProcessor's processRequest function isn't ever getting called
> to complete the requests.
>
> The CommitProcessor's run() is doing this:
>
> Thread 23510: (state = BLOCKED)
>  - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be
> imprecise)
>  - org.apache.zookeeper.server.quorum.CommitProcessor.run() @bci=165,
> line=182 (Compiled frame)
>
> Based on the state, it made it to wait() because isWaitingForCommit()==true
> && committedRequests.isEmpty()==true.
>
> Strangely, once we detached from the jvm, it must have woken up this thread
> and the queue flushed out as expected, bringing everything back to normal.
>
> I'll keep digging, but any help or direction would be appreciated as I'm
> not very familiar with this area of the codebase.
>
> Thanks!
> Jared
>
>
> On Tue, Feb 17, 2015 at 2:38 PM, Flavio Junqueira <
> fpjunqueira@yahoo.com.invalid> wrote:
>
>> It doesn't ring a bell, but it might be worth having a look at the logs to
>> see if there is anything unusual.
>>
>> Just to clarify, was the number of outstanding requests growing, constant?
>> I suppose the server was following/leading and operations were going
>> through, otherwise it'd have dropped the connection to the leader or
>> leadership.
>>
>> -Flavio
>>
>> > On 17 Feb 2015, at 18:01, Marshall McMullen <marshall.mcmullen@gmail.com>
>> wrote:
>> >
>> > Greetings,
>> >
>> > We saw an issue recently that I've never seen before and am hoping I can
>> > get some clarity on what may cause this and whether it's a known issue.
>> We
>> > had a 5 node ensemble and were unable to connect to one of the ZooKeeper
>> > instances.  When trying to connect with zkCli it would timeout. When I
>> > connected via telnet and issued the srvr four letter word, I was
>> surprised
>> > to see that this one server reported a massive number of 'Outstanding'
>> > requests. I'd never seen that really be anything other than 0 before. On
>> > the ZK dev guide it says:
>> >
>> > "outstanding is the number of queued requests, this increases when the
>> > server is under load and is receiving more sustained requests than it can
>> > process, ie the request queue". I looked at all the ZK servers in my
>> > ensemble:
>> >
>> > for ip in 101 102 103 104 105; do echo srvr | nc 172.21.20.${ip} 2181 |
>> > grep Outstanding; done
>> > Outstanding: 0
>> > Outstanding: 0
>> > Outstanding: 0
>> > Outstanding: 0
>> > Outstanding: 18876
>> >
>> > I eventually killed ZK on the affected server and everything corrected
>> > itself and Outstanding went to zero and I was able to connect again.
>> >
>> > Is this something anyone's familiar with? I have logs if it would be
>> > helpful.
>> >
>> > Thanks!
>>
>>

Mime
View raw message