zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <phu...@gmail.com>
Subject Re: ephemeral node not deleted after client session closed
Date Thu, 10 Nov 2011 22:08:18 GMT
On Thu, Nov 10, 2011 at 1:52 PM, Neha Narkhede <neha.narkhede@gmail.com> wrote:
> Thanks for the quick responses, guys! Please find my replies inline -
>
>>> 1) Why is the session closed, the client closed it or the cluster
> expired it?
> Cluster expired it.
>

Yes, I realized after that the cxid is 0 in your logs - that indicates
it was expired and not closed explicitly by the client.


>>> 3) the znode exists on all 4 servers, is that right?
> Yes
>

This holds up my theory that the PrepRequestProcessor is accepting a
create from the client after the session has been expired.

>>> 5) why are your max latencies, as well as avg latency, so high?
>>> a) are these dedicated boxes, not virtualized, correct?
> these are dedicated boxes, but zk is currently co-located with kafka, but
> on different disks
>
>>> b) is the jvm going into gc pause? (try turning on verbose logging, or
> use "jstat" with the gc options to see the history if you still have
> those jvms running)
> I don't believe we had gc logs on these machines. So its unclear.
>
>>> d) do you have dedicated spindles for the ZK WAL? If not another
> process might be causing the fsyncs to pause. (you can use iostat or
> strace to monitor this)
> No. The log4j and zk txn logs share the same disks.
>
>>> Is that the log from the server that's got the 44sec max latency?
> Yes.
>
>>> This is 3.3.3 ?
> Yes.
>
>>> was there any instability in the quorum itself during this time
> period?
> How do I find that out ?

The logs would indicate if an election happens. Look for "LOOKING" or
"LEADING" or "FOLLOWING".


Your comments are consistent with my theory. Seems like a bug in PRP
session validation to me.

Patrick

Mime
View raw message