zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: ephemeral node not deleted after client session closed
Date Thu, 10 Nov 2011 19:54:54 GMT
ps. was there any instability in the quorum itself during this time
period? Attaching logs to 1208 would be helpful.

Patrick

On Thu, Nov 10, 2011 at 11:50 AM, Patrick Hunt <phunt@apache.org> wrote:
> Great report, some addl questions:
>
> 1) Why is the session closed, the client closed it or the cluster expired it?
>
> 2) which server was the session attached to - the first (44sec max
> lat) or one of the others? Which server was the leader?
>
> 3) the znode exists on all 4 servers, is that right?
>
> 4) it's clear that a znode is being created for a session that's just
> been closed. This is certainly a bug. Perhaps a timing issue due to
> the huge latency hit that you reported.
>
> 5) why are your max latencies, as well as avg latency, so high?
> a) are these dedicated boxes, not virtualized, correct?
> b) is the jvm going into gc pause? (try turning on verbose logging, or
> use "jstat" with the gc options to see the history if you still have
> those jvms running)
> c) swapping?
> d) do you have dedicated spindles for the ZK WAL? If not another
> process might be causing the fsyncs to pause. (you can use iostat or
> strace to monitor this)
>
> My guess is b given it's the time btw the commit processor and
> finalreq processor that's so long. Is that the log from the server
> that's got the 44sec max latency?
>
> Patrick
>
> On Thu, Nov 10, 2011 at 11:15 AM, Neha Narkhede <neha.narkhede@gmail.com> wrote:
>> Hi,
>>
>> We are seeing this issue again with our Kafka zookeeper setup. Let me
>> explain with the following log entries -
>>
>> Here is the node that didn't get deleted -
>>
>> get /kafka-tracking/brokers/ids/528
>> 172.18.98.101-1320781574264:172.18.98.101:10251
>> cZxid = 0x605f90a78ctime = Tue Nov 08 11:46:26 PST 2011
>> mZxid = 0x605f90a78mtime = Tue Nov 08 11:46:26 PST 2011
>> pZxid = 0x605f90a78cversion = 0
>> dataVersion = 0
>> aclVersion = 0
>> ephemeralOwner = 0x23220b93b4a33bc
>> dataLength = 47
>> numChildren = 0
>>
>> I ran Kishore's awesome version of the LogFormatter tool on the
>> relevant portion of the zookeeper transaction logs -
>>
>> time:11/8/11 11:46:14 AM PST session:0x23220b93b4a33bc cxid:0x0
>> zxid:0x605f90a59 type:createSession timeOut:6000
>> time:11/8/11 11:46:26 AM PST session:0x23220b93b4a33bc cxid:0x0
>> zxid:0x605f90a72 type:closeSession
>> time:11/8/11 11:46:26 AM PST session:0x23220b93b4a33bc cxid:0x1
>> zxid:0x605f90a78 type:create acl:[31,s{'world,'anyone}
>>
>> We want to understand why there was a 12 second delay in creation of
>> znode after createSession
>>
>> Checking the logs, we found that the there is a 30+ sec delay between
>> the commitProcessor and FinalRequestProcessor.
>>
>> 2011-11-08 11:46:14,661 - DEBUG
>> [QuorumPeer:/0:0:0:0:0:0:0:0:12913:CommitProcessor@159] - Committing
>> request:: sessionid:0x23220b93b4a33bc type:createSession
>>  cxid:0x0 zxid:0x605f90a59 txntype:-10 reqpath:n/a
>> 2011-11-08 11:46:26,045 - DEBUG
>> [QuorumPeer:/0:0:0:0:0:0:0:0:12913:CommitProcessor@159] - Committing
>> request:: sessionid:0x23220b93b4a33bc type:closeSession
>> cxid:0x0 zxid:0x605f90a72 txntype:-11 reqpath:n/a
>> 2011-11-08 11:46:26,047 - DEBUG
>> [QuorumPeer:/0:0:0:0:0:0:0:0:12913:CommitProcessor@159] - Committing
>> request:: sessionid:0x23220b93b4a33bc type:create cxid:0
>> x1 zxid:0x605f90a78 txntype:1 reqpath:n/a
>> 2011-11-08 11:46:59,539 - DEBUG
>> [CommitProcessor:1:FinalRequestProcessor@78] - Processing request::
>> sessionid:0x23220b93b4a33bc type:createSession cxid:0x0 z
>> xid:0x605f90a59 txntype:-10 reqpath:n/a
>> 2011-11-08 11:46:59,587 - DEBUG
>> [CommitProcessor:1:FinalRequestProcessor@78] - Processing request::
>> sessionid:0x23220b93b4a33bc type:closeSession cxid:0x0 zx
>> id:0x605f90a72 txntype:-11 reqpath:n/a
>> 2011-11-08 11:46:59,589 - DEBUG
>> [CommitProcessor:1:FinalRequestProcessor@78] - Processing request::
>> sessionid:0x23220b93b4a33bc type:create cxid:0x1 zxid:0x6
>> 05f90a78 txntype:1 reqpath:n/a
>>
>> Here is the stat output from our zookeeper servers -
>>
>> Latency min/avg/max: 0/53/44712
>> Latency min/avg/max: 0/11/11727
>> Latency min/avg/max: 0/12/11994
>> Latency min/avg/max: 0/9/11707
>>
>> The relevant bug is here - https://issues.apache.org/jira/browse/ZOOKEEPER-1208
>>
>> This is a pretty serious issue since the fix is to have someone
>> manually delete the ephemeral node. And until that is done, it blocks
>> the startup of either the consumer client or the Kafka servers. Any
>> help here in debugging this issue is appreciated !
>>
>> Thanks,
>> Neha
>>
>

Mime
View raw message