zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@yahoo-inc.com>
Subject Re: Fast leader election initial delay, is that possible?
Date Thu, 18 Aug 2011 15:54:51 GMT
Hi Ted, I don't see how one can automate the distinction between a  
machine that is down because it crashed and a machine that is down  
because it hasn't started yet. Assuming that we are logging the  
machine unavailability as we are doing currently, one can always look  
at the timestamp of the warning and remember that this is the time the  
machines were bootstrapping. Consequently, I don't really see the  
point of reducing the number of warnings, unless the warnings are  
really polluting the logs. I typically don't see so many that prevents  
me from reading the rest, but you may have a different perception.  
Also, recall that we back off, so the warnings become less frequent  
over time.

I'm open to ideas, though. If you see anything wrong in my rationale  
or if you have an idea of how to do it differently, then I'd be happy  
to hear. However, if the idea is simply to add a parameter that  
configures the time for leader election to start, then I'm currently  
not in favor.


On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:

> Flavio,
> What you say is correct, but the original poster does have a point  
> that many
> of these warnings are to be expected and there is a heuristic that  
> might
> assist in distinguishing some of these cases so that false alarms in  
> the
> logs could be decreased.
> That doesn't seem like a big deal to me, but different people have  
> different
> itches.  In my experience, restarting a ZK cluster from zero almost  
> never
> happens.
> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <ted.dunning@gmail.com>  
> wrote:
>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <sampath@adroitlogic.com 
>> >wrote:
>>> Hhmmm, I think this is a bit different isn't it? Here we know that  
>>> the
>>> first
>>> server to come will be failing to connect to the other as they are  
>>> not yet
>>> up. Anyway our real issue is the warning.
>> We know that.
>> But how does the server know that it is the first server?  That is  
>> the
>> whole point of the leader election.  You might just have a server  
>> rejoining
>> a cluster.  Or you might have a cluster that has been turned off.   
>> Or a
>> cluster with 2 out of 5 machines off and we tried to touch the  
>> other down
>> machine before the others.
>>>> Would you like to suggest a patch?
>>> Of course I do.. will prepare a patch and attach.
>> Great!


research scientist

direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

View raw message