hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Reed <br...@yahoo-inc.com>
Subject Re: Session expiration caused by time change
Date Thu, 19 Aug 2010 22:51:31 GMT
i'm updating ZOOKEEPER-366 with this discussion and try to get a patch 
out. Qing (or anyone else, can you reproduce it pretty easily?)

thanx
ben

On 08/19/2010 09:29 AM, Ted Dunning wrote:
> Nice (modulo inverting the<  in your text).
>
> Option 2 seems very simple.  That always attracts me.
>
> On Thu, Aug 19, 2010 at 9:19 AM, Benjamin Reed<breed@yahoo-inc.com>  wrote:
>
>    
>> yes, you are right. we could do this. it turns out that the expiration code
>> is very simple:
>>
>>             while (running) {
>>                 currentTime = System.currentTimeMillis();
>>                 if (nextExpirationTime>  currentTime) {
>>                     this.wait(nextExpirationTime - currentTime);
>>                     continue;
>>                 }
>>                 SessionSet set;
>>                 set = sessionSets.remove(nextExpirationTime);
>>                 if (set != null) {
>>                     for (SessionImpl s : set.sessions) {
>>                         sessionsById.remove(s.sessionId); expirer.expire(s);
>>                     }
>>                 }
>>                 nextExpirationTime += expirationInterval;
>>             }
>>
>> so we can detect a jump very easily: if nextExpirationTime>  currentTime,
>> we have jumped ahead in time.
>>
>> now the question is, what do we do with this information?
>>
>> option 1) we could figure out the jump (nextExpirationTime-currentTime is a
>> good estimate) and move all of the sessions forward by that amount.
>> option 2) we could converge on the time by having a policy to always wait
>> at least a half a tick time.
>>
>> there probably are other options as well. i kind of like option 2. worst
>> case is it will make the sessions expire in half the time that they should,
>> but this shouldn't be too much of a problem since clients send a ping if
>> they are idle for 1/3 of their session timeout.
>>
>> ben
>>
>>
>> On 08/19/2010 08:39 AM, Ted Dunning wrote:
>>
>>      
>>> True.  But it knows that there has been a jump.
>>>
>>> Quiet time can be distinguished from clock shift by assuming that members
>>> of
>>> the cluster
>>> don't all jump at the same time.
>>>
>>> I would imagine that a "recent clock jump" estimate could be kept and
>>> buckets that would
>>> otherwise expire due to such a jump could be given a bit of a second lease
>>> on life, delaying
>>> all of their expiration.  Since time-outs are relatively short, the server
>>> would be able to forget
>>> about the bump very shortly.
>>>
>>> On Thu, Aug 19, 2010 at 8:22 AM, Benjamin Reed<breed@yahoo-inc.com>
>>>   wrote:
>>>
>>>
>>>
>>>        
>>>> if we try to use network messages to detect and correct the situation, it
>>>> seems like we would recreate the problem we are having with ntp, since
>>>> that
>>>> is exactly what it does.
>>>>
>>>>
>>>>
>>>>          
>>>        
>>      


Mime
View raw message