zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Han (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2867) an expired ZK session can be re-established
Date Thu, 10 Aug 2017 17:22:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121935#comment-16121935
] 

Michael Han commented on ZOOKEEPER-2867:
----------------------------------------

bq. That Kafka broker went down around Aug 2, 23:16:11. The expectation is that its ZK session
will be closed and the registered ephemeral node will be removed.

The expectation will be correct if this Kafka broker really died. But if it was a transient
interruption, i.e. network issues between that Kafka broker and ZooKeeper, then the session
will be renewed and ephemeral nodes appertain to that session will be intact. From the original
log posted, the session {{0x25cd1e82c110001}} was renewed (though the timestamp of the log
does not match {{Aug 2, 23:16:11}} so probably we are not talking about same event. In any
case, do we have any evidence that indicates the broker in question really died?

bq. the following entry doesn't necessarily mean that the ephemeral node is actually removed
since it could be restored.

Correct, I think this log does not imply the final quorum state W.R.T. the ephemeral nodes,
because the logging here was done before a consensus regarding the ephemeral removal proposal
was reached. When a server thinks a client is dead and initiates the session closing, it will
start removing the ephemerals from its data tree while waiting for the votes of the ephemeral
removal proposal, so the ephemeral nodes removal here does not have transactional semantic
because it does not consider the proposal results.

bq. Is there any logging indicating the restore?

When log level is set to DEBUG, {{FinalRequestPrecessor}} will log all requests it processes,
including the ephemeral nodes recreate (which is just normal ephemeral node create since it
has no notion of a state). You can probably search {{type:create}} in logs.

The restore will only happen when a server restarts, or rejoin quorum, and finds out it has
out dated state with rest of quorum. Otherwise, the ephemeral nodes will either be removed
already at quorum level, or not removed at all (on this specific server) :). In both cases
(restarts/rejoin quorum), the server will recover the expected state from rest of quorum by
replaying transactions logs. 

> an expired ZK session can be re-established
> -------------------------------------------
>
>                 Key: ZOOKEEPER-2867
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2867
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.10
>            Reporter: Jun Rao
>         Attachments: zk.0.formatted, zk.1.formatted
>
>
> Not sure if this is a real bug, but I found an instance when a ZK client seems to be
able to renew a session already expired by the ZK server.
> From ZK server log, session 25cd1e82c110001 was expired at 22:04:39.
> {code:java}
> June 27th 2017, 22:04:39.000	INFO	org.apache.zookeeper.server.ZooKeeperServer	Expiring
session 0x25cd1e82c110001, timeout of 12000ms exceeded
> June 27th 2017, 22:04:39.001	DEBUG	org.apache.zookeeper.server.quorum.Leader	Proposing::
sessionid:0x25cd1e82c110001 type:closeSession cxid:0x0 zxid:0x200000fc4 txntype:-11 reqpath:n/a
> June 27th 2017, 22:04:39.001	INFO	org.apache.zookeeper.server.PrepRequestProcessor	Processed
session termination for sessionid: 0x25cd1e82c110001
> June 27th 2017, 22:04:39.001	DEBUG	org.apache.zookeeper.server.quorum.CommitProcessor
Processing request:: sessionid:0x25cd1e82c110001 type:closeSession cxid:0x0 zxid:0x200000fc4
txntype:-11 reqpath:n/a
> June 27th 2017, 22:05:20.324	INFO	org.apache.zookeeper.server.quorum.Learner	Revalidating
client: 0x25cd1e82c110001
> June 27th 2017, 22:05:20.324	INFO	org.apache.zookeeper.server.ZooKeeperServer	Client
attempting to renew session 0x25cd1e82c110001 at /100.96.5.6:47618
> June 27th 2017, 22:05:20.325	INFO	org.apache.zookeeper.server.ZooKeeperServer	Established
session 0x25cd1e82c110001 with negotiated timeout 12000 for client /100.96.5.6:47618
> {code}
> From ZK client's log, it was able to renew the expired session on 22:05:20.
> {code:java}
> June 27th 2017, 22:05:18.590	INFO	org.apache.zookeeper.ClientCnxn	Client session timed
out, have not heard from server in 4485ms for sessionid 0x25cd1e82c110001, closing socket
connection and attempting reconnect	0
> June 27th 2017, 22:05:18.590	WARN	org.apache.zookeeper.ClientCnxn	Client session timed
out, have not heard from server in 4485ms for sessionid 0x25cd1e82c110001	0
> June 27th 2017, 22:05:19.325	WARN	org.apache.zookeeper.ClientCnxn	SASL configuration
failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client'
was found in specified JAAS configuration file: '/opt/confluent/etc/kafka/server_jaas.conf'.
Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server
allows it.	0
> June 27th 2017, 22:05:19.326	INFO	org.apache.zookeeper.ClientCnxn	Opening socket connection
to server 100.65.188.168/100.65.188.168:2181	0
> June 27th 2017, 22:05:20.324	INFO	org.apache.zookeeper.ClientCnxn	Socket connection established
to 100.65.188.168/100.65.188.168:2181, initiating session	0
> June 27th 2017, 22:05:20.327	INFO	org.apache.zookeeper.ClientCnxn	Session establishment
complete on server 100.65.188.168/100.65.188.168:2181, sessionid = 0x25cd1e82c110001, negotiated
timeout = 12000	0
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message