curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henrik Nordvik (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CURATOR-188) Cannot determine the leader if zookeeper leader fails
Date Wed, 08 Apr 2015 07:42:13 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484896#comment-14484896
] 

Henrik Nordvik commented on CURATOR-188:
----------------------------------------

We are also experiencing similar issues. After having network issues, no leader is elected.
We are using the LeaderSelector pattern, and we get the "reconnected" event, yet no leader,
because there's still a hanging lock.

{code}
[zk: localhost:20101(CONNECTED) 0] ls /app/leader/SR
[_c_eadb5f95-ea3c-4bf5-b7b1-c089df38a2bd-lock-0000000746, _c_3c9fd125-e3ce-4ca3-919f-0f5968c2c12c-lock-0000000745,
_c_87358962-171c-4ce2-a34b-92038b400e8                                                   d-lock-0000000744]
[zk: localhost:20101(CONNECTED) 1] get /app/leader/SR/_c_eadb5f95-ea3c-4bf5-b7b1-c089df38a2bd-lock-0000000746
10.0.0.148
cZxid = 0x2900012cec
ctime = Sun Mar 29 03:56:17 CEST 2015
mZxid = 0x2900012cec
mtime = Sun Mar 29 03:56:17 CEST 2015
pZxid = 0x2900012cec
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x34c5d99eea20001
dataLength = 10
numChildren = 0
[zk: localhost:20101(CONNECTED) 2] get /app/leader/SR/_c_3c9fd125-e3ce-4ca3-919f-0f5968c2c12c-lock-0000000745
10.0.0.151
cZxid = 0x290000256c
ctime = Sat Mar 28 05:19:43 CET 2015
mZxid = 0x290000256c
mtime = Sat Mar 28 05:19:43 CET 2015
pZxid = 0x290000256c
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x14c5d99f0850000
dataLength = 10
numChildren = 0
[zk: localhost:20101(CONNECTED) 3] get /app/leader/SR/_c_87358962-171c-4ce2-a34b-92038b400e8d-lock-0000000744
10.0.0.148
cZxid = 0x29000007bb
ctime = Sat Mar 28 01:24:50 CET 2015
mZxid = 0x29000007bb
mtime = Sat Mar 28 01:24:50 CET 2015
pZxid = 0x29000007bb
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x34c5d99eea20001
dataLength = 10
numChildren = 0
{code}
When we stop the node having two locks (10.0.0.148), both locks disappear and the other node
is elected leader.

> Cannot determine the leader if zookeeper leader fails
> -----------------------------------------------------
>
>                 Key: CURATOR-188
>                 URL: https://issues.apache.org/jira/browse/CURATOR-188
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 2.7.1
>            Reporter: Rodrigo Nogueira
>
> Hi,
> I'm trying to upgrade the curator framework from 2.6.0 to 2.7.1, but I'm having some
problems.
> In the 2.6.0 version almost everything works fine, but the ServiceDiscovery.updateService()
that is already fixed in the 2.7.1.
> In the 2.7.1 version, when I kill the zookeeper leader, my path for leader election becomes
inconsistent. 
> For instance, I have three apps registered in the leader path (/com/myapp/leader/):
> [_c_85089ba7-0819-40a2-90b5-640bcb5e9e68-lock-0000000003, _c_070619f6-539e-4784-8068-bdc66d2a25bc-lock-0000000005,
_c_54a126d3-31e8-464f-9216-5e0ad23fad1b-lock-0000000004]
> After killing the zookeeper leader, what I got in the /com/myapp/leader/ is:
> [_c_648d5311-a59c-4bc4-bf32-c0605dea9b6a-lock-0000000007, _c_85089ba7-0819-40a2-90b5-640bcb5e9e68-lock-0000000003,
_c_f51f9660-3cbf-4ba8-8dba-c1e04ca14a93-lock-0000000008, _c_49696b77-e45a-40b6-8feb-96623c67fd85-lock-0000000006]
> Sometimes I got more nodes (five or six).
> I'm aware that Curator removes and adds all nodes when a zookeeper node fails. But it
seems that the previous nodes are not being removed correctly.
> Is that the expected behavior ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message