zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Chan <daniel.cw.c...@oracle.com>
Subject RE: [3.4.6] Ephemeral node not deleted after session is gone
Date Tue, 03 Apr 2018 18:49:05 GMT
Hi Andor,

Please see my replies and requested information inline.

Thanks,
Daniel

-----Original Message-----
From: Andor Molnar [mailto:andor@cloudera.com] 
Sent: Tuesday, April 3, 2018 2:26 AM
To: user@zookeeper.apache.org
Subject: Re: [3.4.6] Ephemeral node not deleted after session is gone

There're a few questions on the original thread which might be useful to answer here as well:

1) Why is the session closed, the client closed it or the cluster expired it?
[Daniel Chan] in this case, the client got killed and we expect the session would be expired
by the cluster

2) which server was the session attached to - the first (44sec max
lat) or one of the others? Which server was the leader?
[Daniel Chan] The sessions creating the ephemeral nodes were attached to Server1 (443 max
latency) while Server2 is the leader

3) the znode exists on all 4 servers, is that right?
[Daniel Chan] The cluster has 2 members not 4, and the ephemeral nodes are present on both
servers

Would also be useful to attach server logs related to the session expiration as well as LogFormatter
output of txn log files about the nodes.
[Daniel Chan] Only found these logs from Server1 related to the sessions (0x162183ea9f70002
and 0x162183ea9f70003):
2018-03-12 03:28:35,127 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
- Accepted socket connection from /10.196.18.60:26775
2018-03-12 03:28:35,131 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822]
- Connection request from old client /10.196.18.60:26775; will be dropped if server is in
r-o mode
2018-03-12 03:28:35,131 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868]
- Client attempting to establish new session at /10.196.18.60:26775
2018-03-12 03:28:35,137 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established
session 0x162183ea9f70002 with negotiated timeout 9000 for client /10.196.18.60:26775

2018-03-12 03:30:36,415 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
- Accepted socket connection from /10.247.114.70:39260
2018-03-12 03:30:36,422 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822]
- Connection request from old client /10.247.114.70:39260; will be dropped if server is in
r-o mode
2018-03-12 03:30:36,423 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868]
- Client attempting to establish new session at /10.247.114.70:39260
2018-03-12 03:30:36,428 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established
session 0x162183ea9f70003 with negotiated timeout 9000 for client /10.247.114.70:39260

2018-03-31 01:29:58,865 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007]
- Closed socket connection for client /10.247.114.70:39260 which had sessionid 0x162183ea9f70003

Txn logs on the two ephemeral nodes /brokers/ids/707577499 and /brokers/ids/822712429:
3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x6 zxid 0x10000001b create '/brokers/ids,,v{s{31,s{'world,'anyone}}},F,1
3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x2c zxid 0x100000028 create '/brokers/ids/707577499,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235333135363931222c22686f7374223a22736c6331336e79692e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,1
3/11/18 8:30:36 PM PDT session 0x162183ea9f70003 cxid 0x14 zxid 0x100000030 create '/brokers/ids/822712429,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235343336393139222c22686f7374223a22736c6331336e796a2e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,2

Regards,
Andor


On Tue, Apr 3, 2018 at 10:34 AM, Andor Molnar <andor@cloudera.com> wrote:

> Hi Daniel,
>
> Thanks for the bugreport.
> Interesting that this issue should have been fixed already by ages:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org
> _jira_browse_ZOOKEEPER-2D1208&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qI
> rMUB65eapI_JnE&r=JE3yjNS4hXa8nS9n2uFCwEqMvv18hzzEnqunUhCoEns&m=eycsMys
> gttgbjNW3XhfWJ8TgkcWxEFjljV-TpzD5EFU&s=ryVABxZ1RLdjrc3D4I6M5ZpST_jU6GV
> QDWoE4AH83L0&e=
>
> Regards,
> Andor
>
>
> On Tue, Apr 3, 2018 at 3:22 AM, Daniel Chan 
> <daniel.cw.chan@oracle.com>
> wrote:
>
>> We have a live Zookeeper environment (quorum size is 2) and observed 
>> a strange behavior:
>> Kafka created 2 ephemeral nodes /brokers/ids/822712429 and
>> /brokers/ids/707577499 on 2018-03-12 03:30:36.933 The Kafka clients 
>> were long gone but as of today, the two ephemeral nodes are still 
>> present
>>
>> Troubleshooting:
>> 1) Lists the outstanding sessions and ephemeral nodes $ echo dump | 
>> nc $SERVER1 2181 SessionTracker dump:
>> org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
>> ephemeral nodes dump:
>> Sessions with Ephemerals (2):
>> 0x162183ea9f70003:
>>         /brokers/ids/822712429
>> 0x162183ea9f70002:
>>         /brokers/ids/707577499
>>         /controller
>>
>> 2) stat on /brokers/ids/822712429
>> zk> stat /brokers/ids/822712429
>> czxid: 4294967344
>> mzxid: 4294967344
>> pzxid: 4294967344
>> ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
>> mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
>> version: 0
>> cversion: 0
>> aversion: 0
>> owner: 99668799174148099
>> datalen: 102
>> children: 0
>>
>> 3) List full connection/session details for all clients connected $ 
>> echo cons | nc $SERVER1 2181
>>  /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
>>  /10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid=
>> 0x162183ea9f70c22,lop=PING,est=1522713395028,to=40000,
>> lcxid=0x12,lzxid=0xffffffffffffffff,lresp=1522717802117,
>> llat=0,minlat=0,avglat=0,maxlat=31)
>>
>> $ echo cons | nc $SERVER2 2181
>>  /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
>>  /10.247.114.69:42679[1](queued=0,recved=73800,sent=73800,
>> sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000
>> ,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,
>> minlat=0,avglat=0,maxlat=1235)
>>
>> 4) health
>> $ echo mntr | nc $SERVER1 2181
>> zk_version      3.4.6-1569965, built on 02/20/2014 09:09 GMT
>> zk_avg_latency  0
>> zk_max_latency  443
>> zk_min_latency  0
>> zk_packets_received     11158019
>> zk_packets_sent 11158244
>> zk_num_alive_connections        2
>> zk_outstanding_requests 0
>> zk_server_state follower
>> zk_znode_count  344
>> zk_watch_count  0
>> zk_ephemerals_count     3
>> zk_approximate_data_size        36654
>> zk_open_file_descriptor_count   33
>> zk_max_file_descriptor_count    65536
>>
>> 5) Could not find any special exception from zookeeper logs about the 
>> two sessions
>>
>> Is this a known bug in version 3.4.6? what could be the potential 
>> cause of the issue?
>>
>> Thanks,
>> Daniel
>>
>
>

Mime
View raw message