hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Travis Crawford <traviscrawf...@gmail.com>
Subject Re: monitoring zookeeper
Date Thu, 15 Apr 2010 02:43:48 GMT
Hey Kishore -

Thanks for the info. I found an interesting library called jmetric (
http://code.google.com/p/jmxetric) that reads MBeans and publishes their
contents to Ganglia and its working pretty well. A simplified config looks
like:

<jmxetric-config>
  <jvm process="Zookeeper"/>
  <sample delay="60">
    <mbean
name="org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.3,name2=Leader"
pname="ZK">
      <attribute name="AvgRequestLatency" type="double"/>
      <attribute name="MaxRequestLatency" type="double"/>
      <attribute name="MinRequestLatency" type="double"/>
      <attribute name="OutstandingRequests" type="double"/>
      <attribute name="PacketsReceived" type="double"/>
      <attribute name="PacketsSent" type="double"/>
    </mbean>
  </sample>

It doesn't solve the nested property issue, unfortunately, so I may have to
flatten some statistics as you have. I'm interested in checking out your
code if you don't mind.


At a higher level, I'm interested in setting up the sort of monitoring one
would expect of a critical datacenter service. To start with, I'd like to
collect data necessary to:

- page when there's no leader
- page when minimum number of replicas to reach quorum are present
- email when replicas are missing, but still above quorum minimum.

For example, send an email when 1/5 are down, and page when 2/5 are down.
Also page if there's no leader for some other reason. The operational
metrics like latencies, connections, requests would be useful in
troubleshooting issues as well as capacity planning.

--travis




On Wed, Apr 14, 2010 at 4:50 PM, kishore g <g.kishore@gmail.com> wrote:

> Hi Travis,
>
> We do monitor zookeeper using JMX. We have a simple code which does the
> following
>
>   - parse JMX output and convert the output into key value format. The
>   nested properties are flattened.
>   - Emit the key values using LWES[ http://www.lwes.org/] Api's at regular
>   interval[configurable]
>   - The keys to be emitted can be configured via config file.
>
> We have our own internal reporting framework which displays these metrics.
> In order to differentiate between leader and follower we use separate keys
> to
>
> ReplicatedServer_idXXX_replica.XXX_Follower.AvgRequestLatency=rsf_mrl
> ReplicatedServer_idXXX_replica.XXX_Leader.AvgRequestLatency=rsl_mrl
>
> If the server is leader then rsf_mrl will be empty and vice versa. I can
> provide the code to do this and you can probably change it to meet your
> needs and enhance it to work for Ganglia. Let me know if this helps you.
>
> thanks,
> Kishore G
>
> On Wed, Apr 14, 2010 at 11:12 AM, Travis Crawford
> <traviscrawford@gmail.com>wrote:
>
> > Hey zookeeper gurus -
> >
> > Are there any recommended ways for one to monitor zookeeper ensembles?
> I'm
> > familiar with the four-letter words and that stats are published via JMX
> -
> > I'm more interested in what people are doing with those stats.
> >
> > I'd like to publish the JMX stats to Ganglia, and this works well for the
> > built-in stats. However, the zookeeper-specific names appear to be
> dynamic
> > which causes issues when deciding what to publish. For example, the
> current
> > mode (leader/follower) appears to only be accessible from the bean names,
> > instead of looking at, say, a "mode" stat.
> >
> >
> >
> org.apache.ZooKeeperService:name0=ReplicatedServer_id1,name1=replica.1,name2=Follower
> >
> >
> org.apache.ZooKeeperService:name0=ReplicatedServer_id2,name1=replica.2,name2=Leader
> >
> >
> > The only way I've found to learn if replicas are up-to-date is looking at
> > "synced" buried in followerInfo:
> >
> > $ java -jar cmdline-jmxclient-0.10.5.jar - localhost:8081
> >
> >
> org.apache.ZooKeeperService:name0=ReplicatedServer_id2,name1=replica.2,name2=Leader
> > followerInfo
> > 04/14/2010 18:06:06 +0000 org.archive.jmx.Client followerInfo:
> > FollowerHandler Socket[addr=/10.0.0.10,port=48104,localport=2888]
> > tickOfLastAck:29793 synced?:true queuedPacketLength:0
> > FollowerHandler Socket[addr=/10.0.0.11,port=59599,localport=2888]
> > tickOfLastAck:29793 synced?:true queuedPacketLength:0
> >
> >
> > I don't mind writing a tool to parse the JMX output and publishing to
> > Ganglia if needed, but it seems like a problem that may have already been
> > solved and I'm curious what others are doing. The tool would basically
> take
> > the zookeeper stats, normalize the names, and publish to a timeseries
> > database.
> >
> > Is anyone already monitoring ZK in a way others might find useful?
> >
> > Thanks!
> > Travis
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message