zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Figura, Mark" <mfig...@empirix.com>
Subject RE: Handling of xid rollover
Date Wed, 22 Jun 2016 21:54:30 GMT
Thanks for the responses.

Patrick: Thanks for the example of shutting down ZK for an hour. That makes a lot of sense.

Looking further at our application logs, I see actually only SOME instances see a lost session
- not ALL as I had thought. Other instances see the lost connection, but are able to reestablish
it within a short time. The instances seeing a session loss also have an unexpected gap in
application log timestamps, so I'm assuming this is something on my end.

This caused our processing glitch because we are handling connection and session loss the
same way as recommended in the Curator LeaderSelector docs. I'll look into whether we should
handle those 2 cases separately. I suppose the ultimate solution would be for our app to recover
from a leader change more quickly though...

Thank you!
Mark

-----Original Message-----
From: Jordan Zimmerman [mailto:jordan@jordanzimmerman.com] 
Sent: Wednesday, June 22, 2016 2:10 PM
To: user@zookeeper.apache.org
Subject: Re: Handling of xid rollover

Curator 3.0 will simulate a session expiration when there’s a network partition, but Curator
2.0 does not. If you’re using ZK 3.4.5 you’d be using Curator 2.0 so the only way you’d
see a session expiration is when you successfully reconnect to the ensemble.

-JZ

> On Jun 22, 2016, at 12:58 PM, Patrick Hunt <phunt@apache.org> wrote:
> 
> Hi Mark. See this jira for background:
> https://issues.apache.org/jira/browse/ZOOKEEPER-1277
> 
> However what you describe is correct behavior from our perspective. 
> When the lower 32 roll over we now (that was the fix) force a 
> re-election of the leader. Leader re-election causes the quorum to 
> stop serving clients until a new quorum forms.
> 
> Leader re-election is a normal behavior for the ZK service, it happens 
> whenever the current leader is lost and a new quorum, with a (possibly 
> new) leader needs to reform. Say if the current leader process is restarted.
> Your clients need to be able to handle this situation (typically the 
> client library does this for you).
> 
> That said, you should not be seeing session expiration as a result of this.
> Client timeouts certainly, but not session expiration. It might happen 
> for other reasons, but the leader is the one responsible for expiring sessions.
> If there is no leader (e.g. being re-elected) there is no session 
> expiration. When the new leader is elected it will reset the clock on 
> session expiration, for all sessions, from the time it's reelected. 
> For example you can shutdown the entire ZK server ensemble, start it 
> back up an hour later and the clients should all be able to rejoin. 
> Hm, that said I'm not sure if Curator is doing some special magic, 
> that's the behavior of the stock client that we ship.
> 
> Patrick
> 
> 
> On Wed, Jun 22, 2016 at 6:18 AM, Figura, Mark <mfigura@empirix.com> wrote:
> 
>> Hi,
>> 
>> We are using ZooKeeper 3.4.5 along with Curator to perform leader 
>> elections and also store some application data on a 3-node ensemble. 
>> Our application is not hard-realtime, but glitches in stream 
>> processing do get noticed and may raise support tickets.
>> 
>> Yesterday, we had such a glitch and by looking through the logs, I 
>> found there was an XID rollover. When this happened, a new election 
>> within the ensemble was triggered and all client connections were 
>> closed. From our application's point of view (possibly filtered 
>> through Curator), we saw the session expire and then the connection 
>> was lost. This caused our application to shutdown each component, 
>> re-perform leader elections, and eventually start back up.
>> 
>> We do have an issue where our application is making many more writes 
>> than it should, but once this is fixed, we'll still run into an XID 
>> rollover sooner or later.
>> 
>> Is there something our application can do to handle this situation better?
>> Are there any plans for Zookeeper to handle this situation without 
>> closing client connections?
>> 
>> Thanks!
>> Mark
>> 

Mime
View raw message