It doesn't ring a bell, but it might be worth having a look at the logs to see if there is
anything unusual.
Just to clarify, was the number of outstanding requests growing, constant? I suppose the server
was following/leading and operations were going through, otherwise it'd have dropped the connection
to the leader or leadership.
-Flavio
> On 17 Feb 2015, at 18:01, Marshall McMullen <marshall.mcmullen@gmail.com> wrote:
>
> Greetings,
>
> We saw an issue recently that I've never seen before and am hoping I can
> get some clarity on what may cause this and whether it's a known issue. We
> had a 5 node ensemble and were unable to connect to one of the ZooKeeper
> instances. When trying to connect with zkCli it would timeout. When I
> connected via telnet and issued the srvr four letter word, I was surprised
> to see that this one server reported a massive number of 'Outstanding'
> requests. I'd never seen that really be anything other than 0 before. On
> the ZK dev guide it says:
>
> "outstanding is the number of queued requests, this increases when the
> server is under load and is receiving more sustained requests than it can
> process, ie the request queue". I looked at all the ZK servers in my
> ensemble:
>
> for ip in 101 102 103 104 105; do echo srvr | nc 172.21.20.${ip} 2181 |
> grep Outstanding; done
> Outstanding: 0
> Outstanding: 0
> Outstanding: 0
> Outstanding: 0
> Outstanding: 18876
>
> I eventually killed ZK on the affected server and everything corrected
> itself and Outstanding went to zero and I was able to connect again.
>
> Is this something anyone's familiar with? I have logs if it would be
> helpful.
>
> Thanks!
|