HI stack,
the other interesting part is with the session:
0x26ed968d880001
Looks like it gets disconnected from one of the servers (TIMEOUT). DO you
see any of these messages: "Attempting connection to server" in the logs
before you see all the consecutive
org.apache.zookeeper.ClientCnxn: Exception closing session
0x26ed968d880001 to sun.nio.ch.SelectionKeyImpl@788ab708
java.io.IOException: Read error rc = -1
java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
at
org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
and....
>From the cient 0x26ed968d880001?
Thanks
mahadev
On 2/22/10 11:42 AM, "Stack" <stack@duboce.net> wrote:
> The thing that seems odd to me is that the connectivity complaints are
> out of the zk client, right?, why is it failing getting to member 14
> and why not move to another ensemble member if issue w/ 14?, and if
> there were a general connectivity issue, I'd think that the running
> hbase cluster would be complaining at about the same time (its talking
> to datanodes and masters at this time).
>
> (Thanks for the input lads)
>
> St.Ack
>
>
> On Mon, Feb 22, 2010 at 11:26 AM, Mahadev Konar <mahadev@yahoo-inc.com> wrote:
>> I also looked at the logs. Ted might have a point. It does look like that
>> zookeeper server's are doing fine (though as ted mentions the skew is a
>> little concerning, though that might be due to very few packets served by
>> the first server). Other than that the latencies of 300 ms at max should not
>> cause any timeouts.
>> Also, the number of packets received is pretty low - meaning that it wasn't
>> serving huge traffic. Is there anyway we can check if the network connection
>> from the client to the server is not flaky?
>>
>> Thanks
>> mahadev
>>
>>
>> On 2/22/10 10:40 AM, "Ted Dunning" <ted.dunning@gmail.com> wrote:
>>
>>> Not sure this helps at all, but these times are remarkably asymmetrical. I
>>> would expect members of a ZK cluster to have very comparable times.
>>>
>>> Additionally, 345 ms is nowhere near large enough to cause a session to
>>> expire. My take is that ZK doesn't think it caused the timeout.
>>>
>>> On Mon, Feb 22, 2010 at 10:18 AM, Stack <stack@duboce.net> wrote:
>>>
>>>> Latency min/avg/max: 2/125/345
>>>> ...
>>>> Latency min/avg/max: 0/7/81
>>>> ...
>>>> Latency min/avg/max: 1/1/1
>>>>
>>>> Thanks for any pointers on how to debug.
>>>>
>>
>>
|