zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yangwei (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2219) ZooKeeper server should better handle SessionMovedException.
Date Tue, 30 Oct 2018 00:54:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667946#comment-16667946
] 

yangwei commented on ZOOKEEPER-2219:
------------------------------------

https://issues.apache.org/jira/browse/ZOOKEEPER-2886 fixed

> ZooKeeper server should better handle SessionMovedException.
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2219
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2219
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.5
>            Reporter: zhihai xu
>            Priority: Major
>
> ZooKeeper server should better handle SessionMovedException.
> We hit the SessionMovedException. the following is the reason for the SessionMovedException
we find:
> 1. ZK client tried to connect to Leader L. Network was very slow, so before leader processed
the request, client disconnected.
> 2. Client then re-connected to Follower F reusing the same session ID. It was successful.
> 3. The request in step 1 went into leader. Leader processed it and invalidated the connection
created in step 2. But client didn't know the connection it used is invalidated.
> 4. Client got SessionMovedException when it used the connection invalidated by leader
for any ZooKeeper operation.
> The following are logs: c045dkh is the Leader, c470udy is the Follower and the sessionID
is 0x14be28f50f4419d.
> 1. ZK client try to initiate session to Leader at 015-03-16 10:59:40,735 and timeout
after 10/3 seconds.
> logs from ZK client 
> {code}
> 2015-03-16 10:59:40,078 INFO org.apache.zookeeper.ClientCnxn: Client session timed out,
have not heard from server in 6670ms for sessionid 0x14be28f50f4419d, closing socket connection
and attempting reconnect
> 015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection
to server c045dkh/?.?.?.67:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Socket connection established
to c045dkh/?.?.?.67:2181, initiating session
> 2015-03-16 10:59:44,071 INFO org.apache.zookeeper.ClientCnxn: Client session timed out,
have not heard from server in 3336ms for sessionid 0x14be28f50f4419d, closing socket connection
and attempting reconnect
> {code}
> 2. ZK client initiated session to Follower successfully at 2015-03-16 10:59:44,688
> logs from ZK client
> {code}
> 2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection
to server c470udy/?.?.?.65:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Socket connection established
to c470udy/?.?.?.65:2181, initiating session
> 2015-03-16 10:59:44,688 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete
on server c470udy/?.?.?.65:2181, sessionid = 0x14be28f50f4419d, negotiated timeout = 10000
> {code}
> logs from ZK Follower server
> {code}
> 2015-03-16 10:59:44,673 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted
socket connection from /?.?.?.65:42777
> 2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting
to renew session 0x14be28f50f4419d at /?.?.?.65:42777
> 2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.quorum.Learner: Revalidating
client: 0x14be28f50f4419d
> 2015-03-16 10:59:44,675 INFO org.apache.zookeeper.server.ZooKeeperServer: Established
session 0x14be28f50f4419d with negotiated timeout 10000 for client /?.?.?.65:42777
> {code}
> 3. At 2015-03-16 10:59:45,668, Leader processed the delayed request which is sent by
Client at 2015-03-16 10:59:40,735, right after session was established, it close the socket
connection/session because client was already disconnected due to timeout.
> logs from ZK Leader server
> {code}
> 2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting
to renew session 0x14be28f50f4419d at /?.?.?.65:50271
> 2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer: Established
session 0x14be28f50f4419d with negotiated timeout 10000 for client /?.?.?.65:50271
> 2015-03-16 10:59:45,670 WARN org.apache.zookeeper.server.NIOServerCnxn: Exception causing
close of session 0x14be28f50f4419d due to java.io.IOException: Broken pipe
> 2015-03-16 10:59:45,671 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket
connection for client /?.?.?.65:50271 which had sessionid 0x14be28f50f4419d
> {code}
> 4. Client got SessionMovedException at 2015-03-16 10:59:45,693
> logs from ZK Leader server
> {code}
> 2015-03-16 10:59:45,693 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x14be28f50f4419d type:multi cxid:0x86e3 zxid:0x1c002a4e53
txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:null Error:KeeperErrorCode
= Session moved
> 2015-03-16 10:59:45,695 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x14be28f50f4419d type:multi cxid:0x86e5 zxid:0x1c002a4e56
txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:null Error:KeeperErrorCode
= Session moved
> 2015-03-16 10:59:45,700 INFO org.apache.zookeeper.server.PrepRequestProcessor: Got user-level
KeeperException when processing sessionid:0x14be28f50f4419d type:multi cxid:0x86e7 zxid:0x1c002a4e57
txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:null Error:KeeperErrorCode
= Session moved
> {code}
> 5. At 2015-03-16 10:59:45,710, we close the session 0x14be28f50f4419d but the socket
connection between ZK client and ZK Follower is closed at 2015-03-16 10:59:45,715 after session
termination.
> logs from ZK Leader server:
> {code}
> 2015-03-16 10:59:45,710 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed
session termination for sessionid: 0x14be28f50f4419d
> {code}
> logs from ZK Follower server:
> {code}
> 2015-03-16 10:59:45,715 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket
connection for client /?.?.?.65:42777 which had sessionid 0x14be28f50f4419d
> {code}
> It looks like Zk client is out-of-sync with ZK server.
> My question is how ZK client can recover from this error. It looks like the ZK Client
won't be disconnected from Follower until session is closed and any ZooKeeper operation will
fail with SessionMovedException before session is closed.
> Also since ZK Leader already closed the socket connection/session to ZK Client at step
3, why it still reject the ZooKeeper operation from client with SessionMovedException. Will
it be better to endorse the session/connection between ZK client and ZK Follower? This seems
like a bug to me. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message