zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Simms <slyp...@gmail.com>
Subject Re: Healthcheck using the stat command
Date Tue, 24 Jan 2012 04:24:43 GMT
On Mon, Jan 23, 2012 at 5:16 PM, Jordan Zimmerman
<jzimmerman@netflix.com> wrote:
> The problem with 'ruok' is that it doesn't tell you the state of the
> Instance. 'ruok' might return 'imok' but the instance might not be serving
> due to some other error. Only a 'stat' will tell you that.
>
> -JZ
>

Can you provide an example of what that would look like? A high
Outstanding count? A Mode that's not "leader", "follower", or
"observer"?

-J


> On 1/23/12 1:51 PM, "Philip Smith" <philip_smith@apple.com> wrote:
>
>>There is a batch java program that does a health check:
>>
>>validateZookeeperService.validateZookeeperService()
>>
>>
>>which basically runs the ruok command. You could run the stat command and
>>parse out the response but I think 99% of what you want could be simply
>>looking for 'imok' in the response to the ruok command.
>>
>>philip_smith@st11p00td-devlog001:~ 2 $ alias zkok
>>alias zkok='for idx in 1 2 3 4 5 ; do export
>>zkserver="st11p00td-zookeeper00${idx}" ; echo "$zkserver $( echo ruok |
>>nc  $zkserver 2181 )" ; done'
>>philip_smith@st11p00td-devlog001:~ 3 $ zkok
>>st11p00td-zookeeper001 imok
>>st11p00td-zookeeper002 imok
>>st11p00td-zookeeper003 imok
>>st11p00td-zookeeper004 imok
>>st11p00td-zookeeper005 imok
>>philip_smith@st11p00td-devlog001:~ 4 $
>>
>>
>>On Jan 23, 2012, at 1:45 PM, Natarajan Suresh wrote:
>>
>>> I am trying to write a small health check script for the zookeeper
>>>instances. The "stat" command gives me an output like this:
>>> ===========================Zookeeper version: 3.3.3-1073969, built on
>>>02/23/2011 22:27 GMTClients:
>>>/127.0.0.1:38929[0](queued=0,recved=1,sent=0)
>>>/17.155.7.152:37603[1](queued=0,recved=474752,sent=474752)
>>> Latency min/avg/max: 0/0/35Received: 1113675Sent: 1113706Outstanding:
>>>0Zxid: 0x2000e2925Mode: followerNode count: 71===========================
>>> How do I know that the server is ok ? I do not have a bad instance with
>>>me to checkout what the output will be in that case.
>>> If anyone has already written a health check script, can you please
>>>share with me ?
>>> Thanks
>>> |Suresh|
>>
>>
>>Regards, Philip
>>
>>Philip Smith
>>Senior Software Engineer
>>philip_smith@apple.com
>>408 862-1360 office
>>530 574-1659 mobile
>>
>

Mime
View raw message