zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Mathialagan <prasanthmathiala...@gmail.com>
Subject Re: [SUGGESTION] JvmPauseMonitor in ZooKeeper
Date Thu, 10 May 2018 01:34:12 GMT
Hi,
This looks cool :) I have a suggestion. It would be nice if we could add
the current size of the heap or (% of heap used) in the log entry whenever
sleep threshold had exceeded a lot. It could be helpful.

On Wed, May 9, 2018 at 11:26 AM, Patrick Hunt <phunt@apache.org> wrote:

> On Wed, May 9, 2018 at 11:11 AM, Norbert Kalmar <nkalmar@cloudera.com>
> wrote:
>
> > Thanks Patrick, great question.
> > My understanding is that this tool not only shows if JVM spends too much
> > time in GC, but if, for any other reason, there is a JVM pause (The tool
> > only differentiates GC pause from all other pause). This could be slow
> > fsync (although we do have logs for that) or even server/OS related.
> >
> > But again, this is just my interpretation. I will ask the source of the
> > idea, what extra benefits this gives them over java GC log.
> >
> > I checked ZK, I don't see it enabled by default, but GC logging can be
> set
> > with JVM parameters easily, so that shouldn't be a key factor anyway.
> >
> >
> I think that would be a useful change regardless - to make it on by default
> I mean. Also some docs wrt our recommendations, how to troubleshoot, etc...
> Adding a feature is useful, but ensuring people know about it and can  use
> it effectively is even more so.
>
> Regards,
>
> Patrick
>
>
> > Regards,
> > Norbert
> >
> > On Wed, May 9, 2018 at 7:57 PM Patrick Hunt <phunt@apache.org> wrote:
> >
> > > Do you know why they did this rather than just enabling GC logging by
> > > default? Why re-invent the wheel?
> > >
> > > I seem to remember seeing a push do enable GC logging by default a few
> > > years ago. In particular around the time when the JVM added GC log
> > rolling
> > > as a feature. Here's an example:
> > >
> > > https://batmat.net/2016/10/17/always-enable-gc-logs-and-how-
> > to-enable-logs-rotation-with-hotspot/
> > > My understanding is that the overhead is so low that it's feasible to
> do
> > > this.
> > >
> > > Good improvement though regardless which way we go.
> > >
> > > Regards,
> > >
> > > Patrick
> > >
> > > On Wed, May 9, 2018 at 9:36 AM, Andor Molnar <andor@cloudera.com>
> wrote:
> > >
> > > > +1 cool!
> > > >
> > > >
> > > > On Wed, May 9, 2018 at 7:59 AM, Norbert Kalmar <nkalmar@cloudera.com
> >
> > > > wrote:
> > > >
> > > > > Okay, thanks Ed, I created the Jira, will look into it soon :)
> > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-3037
> > > > >
> > > > > Regards,
> > > > > Norbert
> > > > >
> > > > > On Wed, May 9, 2018 at 4:44 PM Edward Ribeiro <
> > > edward.ribeiro@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1. Sounds really nice to have feature. Let's open a ticket
and
> > open
> > > a
> > > > > PR.
> > > > > > :)
> > > > > >
> > > > > > Ed
> > > > > >
> > > > > > Em qua, 9 de mai de 2018 11:15, Norbert Kalmar <
> > nkalmar@cloudera.com
> > > >
> > > > > > escreveu:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I just got a tip that we could improve on the logging in
> > ZooKeeper.
> > > > > > After a
> > > > > > > ZK crash, or client timeout sometimes it's hard to determine
> from
> > > the
> > > > > > logs
> > > > > > > what happened. Knowing if ZK was responsive at the time
would
> > help
> > > a
> > > > > lot.
> > > > > > > For example, ZK might spend a lot of time waiting on GC
(there
> is
> > > > still
> > > > > > > some misconception that ZK is a storage).
> > > > > > >
> > > > > > > To help detect this, HADOOP already has a great tool called
JVM
> > > Pause
> > > > > > > Monitor. (As the name suggest, it can be also used for
> > monitoring,
> > > > but
> > > > > it
> > > > > > > also helps post-mortem in a lot of cases). Basically it
has a
> > > daemon
> > > > > that
> > > > > > > sleeps for one second, and if the sleep time exceeds the
1s by
> > more
> > > > > than
> > > > > > > the threshold (1s: INFO, 10s: WARN by default - this can
be
> > > > > configurable
> > > > > > in
> > > > > > > our case, see below), it will alert/make a log entry. It
can
> also
> > > > > monitor
> > > > > > > the time GC took.
> > > > > > >
> > > > > > > Now, this class is in the HADOOP-common. I wouldn't want
to
> > depend
> > > on
> > > > > > > Hadoop-common because of this one feature/class (it is
> actually a
> > > > > single
> > > > > > > class). Since this is a straightforward implementation,
and in
> > the
> > > > past
> > > > > > > five years the few commits it had is nothing really serious,
I
> > > think
> > > > we
> > > > > > > could just copy this class in ZooKeeper, and introduce
it as a
> > > > > > configurable
> > > > > > > feature, by default it can be off.
> > > > > > >
> > > > > > > The class:
> > > > > > >
> > > > > > >
> > > > > > https://github.com/apache/hadoop/blob/trunk/hadoop-
> > > > > common-project/hadoop-common/src/main/java/org/apache/
> > > > > hadoop/util/JvmPauseMonitor.java
> > > > > > >
> > > > > > > What do You think?
> > > > > > >
> > > > > > > Regards,
> > > > > > > Norbert
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message