zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Szalay-Bekő Máté <szalay.beko.m...@gmail.com>
Subject Re: ZooKeeper Cluster Health Checking
Date Wed, 23 Sep 2020 13:21:31 GMT
Hi Adrien,

I noticed you are setting "dataLogDir" to /var/log/zookeeper. Please
note that ZooKeeper stores transaction logs in the dataLogDir, what is
real data needed for ZooKeeper recovery. These are not regular
application log text files, what you usually want to put into
/var/log.

Otherwise as far as I can tell, your config seems to be OK. ZooKeeper
should trigger the autopurge job in each 48 hours, keeping only the 3
most recent snapshots (plus some transaction logs from the same time
period). Although this ZooKeeper version (3.4.10) is an old one and
not even supported by the community officially. You should consider
upgrading your zookeeper cluster independently from the autopurge
problems... Also there might be some fixes around autoPurge in more
recent versions.

Also you can maybe try to kick-in the purge job manually (and also
looking for errors in the log). I never did this, but there is an
example command in the documentation:
java -cp zookeeper.jar:lib/slf4j-api-1.7.5.jar:lib/slf4j-log4j12-1.7.5.jar:lib/log4j-1.2.17.jar:conf
org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir> -n <count>

see: https://zookeeper.apache.org/doc/r3.4.14/zookeeperAdmin.html

Best regards,
Mate


On Wed, Sep 23, 2020 at 11:04 AM Enrico Olivelli <eolivelli@gmail.com> wrote:
>
> Adrien
>
> Il giorno mer 23 set 2020 alle ore 10:59 adrien ruffie <
> adriennolarsen@hotmail.fr> ha scritto:
>
> > Hello all,
> >
> > I have a problem in production ...
> >
> > We have the following zoo configuration file:
> >
> > tickTime=4000
> > dataDir=/var/lib/zookeeper
> >
> > dataLogDir=/var/log/zookeeper
> >
> > initLimit=30
> > syncLimit=15
> >
> > autopurge.snapRetainCount=3
> > autopurge.purgeInterval=48
> >
> > clientPort=2181
> > maxClientCnxns=60
> >
> > server.1=ZOO1:2888:3888
> > server.2=ZOO2:2888:3888
> > server.3=ZOO3:2888:3888
> > server.4=ZOO4:2888:3888
> > server.5=ZOO5:2888:3888
> >
> > We are in zookeeper-3.4.10, but we recently saw, that log and snapshot
> > aren't purge ...
> > do you know this issue, is a bug, or bad configuration ?
> >
>
> Do you see errors in logs ?
>
> Are you using standard Apache distributions?
>
> Enrico
>
>
> >
> > Thank you very much and best regards
> >
> > Adrien Ruffié
> > ________________________________
> > De : adrien ruffie <adriennolarsen@hotmail.fr>
> > Envoyé : mercredi 18 juillet 2018 09:01
> > À : user@zookeeper.apache.org <user@zookeeper.apache.org>
> > Objet : RE: ZooKeeper Cluster Health Checking
> >
> > Ok thank Harish,
> >
> > I keep the idea !
> >
> >
> > Best regards,
> >
> >
> > Adrien
> >
> > ________________________________
> > De : harish lohar <hklohar@gmail.com>
> > Envoyé : mardi 17 juillet 2018 23:13:28
> > À : user@zookeeper.apache.org
> > Objet : Re: ZooKeeper Cluster Health Checking
> >
> > We did it via java monitoring app , using zookeeper java api which sends 4
> > lw commands to zookeeper and returns the output.
> >
> >
> > Thanks
> > Harish
> >
> > On Tue, Jul 17, 2018 at 2:00 AM adrien ruffie <adriennolarsen@hotmail.fr>
> > wrote:
> >
> > > Hi Harish,
> > >
> > >
> > > thank you very much for this advise and explanation !
> > >
> > > Do you think with just a simple script shell for checking all this
> > metrics
> > > is enough ? Or would better to do it in a Java with a simple monitoring
> > > application?
> > >
> > >
> > > Thank again,
> > >
> > >
> > > Best regards,
> > >
> > >
> > > Adrien
> > >
> > > ________________________________
> > > De : harish lohar <hklohar@gmail.com>
> > > Envoyé : mardi 17 juillet 2018 04:13:51
> > > À : user@zookeeper.apache.org
> > > Objet : Re: ZooKeeper Cluster Health Checking
> > >
> > > Hi Adrian,
> > > Below zookeeper commands are generally used to get health of zookeeper
> > > cluster
> > > stat
> > >
> > > Lists brief details for the server and connected clients.
> > >
> > > usage echo stat | nc server port
> > >
> > > This gives whether cluster is up /down. If down this will give that
> > >
> > > Zookeeper instance is currently not serving any request -  which means
> > > either the leader election is failing or <= 50% of zookeeper node in
> > > cluster are down.
> > >
> > >
> > > mntr
> > >
> > > *New in 3.4.0:* Outputs a list of variables that could be used for
> > > monitoring the health of the cluster.
> > >
> > > $ echo mntr | nc localhost 2185
> > >
> > > zk_version  3.4.0
> > > zk_avg_latency  0
> > > zk_max_latency  0
> > > zk_min_latency  0
> > > zk_packets_received 70
> > > zk_packets_sent 69
> > > zk_outstanding_requests 0
> > > zk_server_state leader
> > > zk_znode_count   4
> > > zk_watch_count  0
> > > zk_ephemerals_count 0
> > > zk_approximate_data_size    27
> > > zk_followers    4                   - only exposed by the Leader
> > > zk_synced_followers 4               - only exposed by the Leader
> > > zk_pending_syncs    0               - only exposed by the Leader
> > > zk_open_file_descriptor_count 23    - only available on Unix platforms
> > > zk_max_file_descriptor_count 1024   - only available on Unix platforms
> > >
> > > The output is compatible with java properties format and the content may
> > > change over time (new keys added). Your scripts should expect changes.
> > >
> > > ATTENTION: Some of the keys are platform specific and some of the keys
> > are
> > > only exported by the Leader.
> > >
> > > The output contains multiple lines with the following format:
> > >
> > >
> > > On Mon, Jul 16, 2018 at 10:13 AM adrien ruffie <
> > adriennolarsen@hotmail.fr>
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > >
> > > > In my company we have a Zookeeper production cluster.
> > > >
> > > >
> > > > But we don't really know how can we check the health of our cluster...
> > > >
> > > >
> > > > Can we advise us about this topic ?
> > > >
> > > >
> > > > I know this topic may has been cropping up for a while, but I don't
> > > really
> > > > found any concrete solution.
> > > >
> > > >
> > > > Do you use a monitoring tools ? Which can launch alert ?
> > > >
> > > > What metrics/properties/any thing which can indicate that our cluster
> > > > isn't in good health.
> > > >
> > > >
> > > > Thank you very much and best regards
> > > >
> > > >
> > > > Adrien
> > > >
> > >
> >

Mime
View raw message