zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sampath Perera <samp...@adroitlogic.com>
Subject Re: Fast leader election initial delay, is that possible?
Date Sat, 20 Aug 2011 02:23:08 GMT
Yeah, that will work for me.

Also, it just is going to be a configuration and the overhead introduced
will only be applicable to the case where this error occurs, as it is just
an if statement before printing out the error.

The default behavior will not be changed and I do not expect any overhead to
be introduced with this to the default case.

OTOH, I am OK to leave it as it is, and let our customer know that, that is
how it is :-) Actually my original intention was to find whether there is
any such configuration, as I was unable to find it on the docs.

So, if the majority of dev's are not in favour of this change I would not do
this.

Thanks for all your feedback!

Sampath

On Fri, Aug 19, 2011 at 4:30 PM, Flavio Junqueira <fpj@yahoo-inc.com> wrote:

> Sampath, Do you think something along the lines of what Ted describes would
> work for you?
>
> -Flavio
>
> On Aug 18, 2011, at 7:13 PM, Ted Dunning wrote:
>
> The thought is that a server would not complain about connection refused or
> inability to form a quorum during the first (say) twenty seconds of
> operation.
>
> The thesis is that warnings from these causes during that time are
> spurious.
>
> As I mentioned, I don't see this as urgent or even necessarily a good idea.
>  I completely reboot a ZK cluster once every year or three.  When I am doing
> a rolling upgrade, I *want* to see alerts when I bounce a machine.  If I
> don't want to see those alerts, my monitoring system allows me to put a
> machine into maintenance mode for a short period of time to temporarily
> suppress the warnings.
>
> All I was doing was translating and elaborating the original poster's
> suggestion, not so much endorsing it.
>
> On Thu, Aug 18, 2011 at 8:54 AM, Flavio Junqueira <fpj@yahoo-inc.com>wrote:
>
>> Hi Ted, I don't see how one can automate the distinction between a machine
>> that is down because it crashed and a machine that is down because it hasn't
>> started yet. Assuming that we are logging the machine unavailability as we
>> are doing currently, one can always look at the timestamp of the warning and
>> remember that this is the time the machines were bootstrapping.
>> Consequently, I don't really see the point of reducing the number of
>> warnings, unless the warnings are really polluting the logs. I typically
>> don't see so many that prevents me from reading the rest, but you may have a
>> different perception. Also, recall that we back off, so the warnings become
>> less frequent over time.
>>
>> I'm open to ideas, though. If you see anything wrong in my rationale or if
>> you have an idea of how to do it differently, then I'd be happy to hear.
>> However, if the idea is simply to add a parameter that configures the time
>> for leader election to start, then I'm currently not in favor.
>>
>> -Flavio
>>
>> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>>
>> Flavio,
>>
>> What you say is correct, but the original poster does have a point that
>> many
>> of these warnings are to be expected and there is a heuristic that might
>> assist in distinguishing some of these cases so that false alarms in the
>> logs could be decreased.
>>
>> That doesn't seem like a big deal to me, but different people have
>> different
>> itches.  In my experience, restarting a ZK cluster from zero almost never
>> happens.
>>
>> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <ted.dunning@gmail.com>
>> wrote:
>>
>>
>>
>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <sampath@adroitlogic.com
>> >wrote:
>>
>>
>>
>> Hhmmm, I think this is a bit different isn't it? Here we know that the
>>
>> first
>>
>> server to come will be failing to connect to the other as they are not yet
>>
>> up. Anyway our real issue is the warning.
>>
>>
>>
>> We know that.
>>
>>
>> But how does the server know that it is the first server?  That is the
>>
>> whole point of the leader election.  You might just have a server
>> rejoining
>>
>> a cluster.  Or you might have a cluster that has been turned off.  Or a
>>
>> cluster with 2 out of 5 machines off and we tried to touch the other down
>>
>> machine before the others.
>>
>>
>>
>>
>> Would you like to suggest a patch?
>>
>>
>>
>> Of course I do.. will prepare a patch and attach.
>>
>>
>>
>> Great!
>>
>>
>>
>>
>>   *flavio*
>> *junqueira*
>>
>> research scientist
>>
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>>
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>>
>>
>>
>
> *flavio*
> *junqueira*
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>


-- 
Thanks,
Sampath
http://adroitlogic.org

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message