zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sampath Perera <samp...@adroitlogic.com>
Subject Re: Fast leader election initial delay, is that possible?
Date Thu, 18 Aug 2011 16:55:07 GMT
s/one of customer/one of our customer

sorry for the typo.

On Thu, Aug 18, 2011 at 10:24 PM, Sampath Perera <sampath@adroitlogic.com>wrote:

> Hi Flavio,
>
> On Thu, Aug 18, 2011 at 9:24 PM, Flavio Junqueira <fpj@yahoo-inc.com>wrote:
>
>> Hi Ted, I don't see how one can automate the distinction between a machine
>> that is down because it crashed and a machine that is down because it hasn't
>> started yet. Assuming that we are logging the machine unavailability as we
>> are doing currently, one can always look at the timestamp of the warning and
>> remember that this is the time the machines were bootstrapping.
>> Consequently, I don't really see the point of reducing the number of
>> warnings, unless the warnings are really polluting the logs. I typically
>> don't see so many that prevents me from reading the rest, but you may have a
>> different perception. Also, recall that we back off, so the warnings become
>> less frequent over time.
>>
>
> True, but one of customer deployments have a log analyzing tool and sends
> notifications for the errors on the log, as you previously said we cannot
> get an optimal value for this timeout, but we can come up with a sub optimal
> value to get rid of this warning.
>
>
>>
>> I'm open to ideas, though. If you see anything wrong in my rationale or if
>> you have an idea of how to do it differently, then I'd be happy to hear.
>> However, if the idea is simply to add a parameter that configures the time
>> for leader election to start, then I'm currently not in favor.
>>
>
> Well, what I was originally looking for was to delay the leader election,
> but as pointed out by Ted, I was going to provide a path on printing this
> warning. (If you carefully look at Ted's comment, and my response,  was
> thinking of a timeout for the warning to be considered as a warning to be
> printed on the log... at least that is what I got from Ted's first comment).
> What do you think about that?
>
>
>>
>> -Flavio
>>
>> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>>
>> Flavio,
>>
>> What you say is correct, but the original poster does have a point that
>> many
>> of these warnings are to be expected and there is a heuristic that might
>> assist in distinguishing some of these cases so that false alarms in the
>> logs could be decreased.
>>
>> That doesn't seem like a big deal to me, but different people have
>> different
>> itches.  In my experience, restarting a ZK cluster from zero almost never
>> happens.
>>
>> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <ted.dunning@gmail.com>
>> wrote:
>>
>>
>>
>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <sampath@adroitlogic.com
>> >wrote:
>>
>>
>>
>> Hhmmm, I think this is a bit different isn't it? Here we know that the
>>
>> first
>>
>> server to come will be failing to connect to the other as they are not yet
>>
>> up. Anyway our real issue is the warning.
>>
>>
>>
>> We know that.
>>
>>
>> But how does the server know that it is the first server?  That is the
>>
>> whole point of the leader election.  You might just have a server
>> rejoining
>>
>> a cluster.  Or you might have a cluster that has been turned off.  Or a
>>
>> cluster with 2 out of 5 machines off and we tried to touch the other down
>>
>> machine before the others.
>>
>>
>>
>>
>> Would you like to suggest a patch?
>>
>>
>>
>> Of course I do.. will prepare a patch and attach.
>>
>>
>>
>> Great!
>>
>>
>>
>>
>>   *flavio*
>> *junqueira*
>>
>> research scientist
>>
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>>
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>>
>>
>>
>
>
> --
> Thanks,
> Sampath
> http://adroitlogic.org
>
>


-- 
Thanks,
Sampath
http://adroitlogic.org

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message