zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sampath Perera <samp...@adroitlogic.com>
Subject Re: Fast leader election initial delay, is that possible?
Date Sat, 20 Aug 2011 02:30:43 GMT
Hi Vishal,

On Sat, Aug 20, 2011 at 1:43 AM, Vishal Kher <vishalmlst@gmail.com> wrote:

> My few cents..
> I am not sure if we can distinguish between spurious/non-spurious warnings
> and I don't think we can time it well. The delay is applicable only in
> certain cases. If the user knows that there will be a start up delay, then
> the user can ignore those errors or modify their scripts to start the server
> after a delay.


I guess you misinterpreted it :-( starting the server after delay is not a
solution for the original problem that I was referring to. I do not also see
it possible to get my original problem fixed through a script. At least I do
not know how to do it. May be changing the log level to something like FATAL
and reverting it back to INFO after the delay?? I do not think that is a
good idea as that will cut off some of the stuff that I want to see.


> Does this have to implemented in the server? I sounds me that this is
> something that user scripts should handle.
>

As I said I do not see how the user script can handle this? if there is any
option please do let me know.

Sampath


>
>
> On Fri, Aug 19, 2011 at 7:00 AM, Flavio Junqueira <fpj@yahoo-inc.com>wrote:
>
>> Sampath, Do you think something along the lines of what Ted describes
>> would work for you?
>>
>> -Flavio
>>
>> On Aug 18, 2011, at 7:13 PM, Ted Dunning wrote:
>>
>> The thought is that a server would not complain about connection refused
>> or inability to form a quorum during the first (say) twenty seconds of
>> operation.
>>
>> The thesis is that warnings from these causes during that time are
>> spurious.
>>
>> As I mentioned, I don't see this as urgent or even necessarily a good
>> idea.  I completely reboot a ZK cluster once every year or three.  When I am
>> doing a rolling upgrade, I *want* to see alerts when I bounce a machine.  If
>> I don't want to see those alerts, my monitoring system allows me to put a
>> machine into maintenance mode for a short period of time to temporarily
>> suppress the warnings.
>>
>> All I was doing was translating and elaborating the original poster's
>> suggestion, not so much endorsing it.
>>
>> On Thu, Aug 18, 2011 at 8:54 AM, Flavio Junqueira <fpj@yahoo-inc.com>wrote:
>>
>>> Hi Ted, I don't see how one can automate the distinction between a
>>> machine that is down because it crashed and a machine that is down because
>>> it hasn't started yet. Assuming that we are logging the machine
>>> unavailability as we are doing currently, one can always look at the
>>> timestamp of the warning and remember that this is the time the machines
>>> were bootstrapping. Consequently, I don't really see the point of reducing
>>> the number of warnings, unless the warnings are really polluting the logs. I
>>> typically don't see so many that prevents me from reading the rest, but you
>>> may have a different perception. Also, recall that we back off, so the
>>> warnings become less frequent over time.
>>>
>>> I'm open to ideas, though. If you see anything wrong in my rationale or
>>> if you have an idea of how to do it differently, then I'd be happy to hear.
>>> However, if the idea is simply to add a parameter that configures the time
>>> for leader election to start, then I'm currently not in favor.
>>>
>>> -Flavio
>>>
>>> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>>>
>>> Flavio,
>>>
>>> What you say is correct, but the original poster does have a point that
>>> many
>>> of these warnings are to be expected and there is a heuristic that might
>>> assist in distinguishing some of these cases so that false alarms in the
>>> logs could be decreased.
>>>
>>> That doesn't seem like a big deal to me, but different people have
>>> different
>>> itches.  In my experience, restarting a ZK cluster from zero almost never
>>> happens.
>>>
>>> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <ted.dunning@gmail.com>
>>> wrote:
>>>
>>>
>>>
>>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <
>>> sampath@adroitlogic.com>wrote:
>>>
>>>
>>>
>>> Hhmmm, I think this is a bit different isn't it? Here we know that the
>>>
>>> first
>>>
>>> server to come will be failing to connect to the other as they are not
>>> yet
>>>
>>> up. Anyway our real issue is the warning.
>>>
>>>
>>>
>>> We know that.
>>>
>>>
>>> But how does the server know that it is the first server?  That is the
>>>
>>> whole point of the leader election.  You might just have a server
>>> rejoining
>>>
>>> a cluster.  Or you might have a cluster that has been turned off.  Or a
>>>
>>> cluster with 2 out of 5 machines off and we tried to touch the other down
>>>
>>> machine before the others.
>>>
>>>
>>>
>>>
>>> Would you like to suggest a patch?
>>>
>>>
>>>
>>> Of course I do.. will prepare a patch and attach.
>>>
>>>
>>>
>>> Great!
>>>
>>>
>>>
>>>
>>>   *flavio*
>>> *junqueira*
>>>
>>> research scientist
>>>
>>> fpj@yahoo-inc.com
>>> direct +34 93-183-8828
>>>
>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>>> phone (408) 349 3300    fax (408) 349 3301
>>>
>>>
>>>
>>
>>   *flavio*
>> *junqueira*
>>
>> research scientist
>>
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>>
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>>
>>
>>
>


-- 
Thanks,
Sampath
http://adroitlogic.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message