zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norbert Kalmar <nkal...@cloudera.com>
Subject Re: [SUGGESTION] JvmPauseMonitor in ZooKeeper
Date Thu, 10 May 2018 09:00:48 GMT
I added the suggestions to the jira.

Thanks all!

On Thu, May 10, 2018 at 3:34 AM Prasanth Mathialagan <
prasanthmathialagan@gmail.com> wrote:

> Hi,
> This looks cool :) I have a suggestion. It would be nice if we could add
> the current size of the heap or (% of heap used) in the log entry whenever
> sleep threshold had exceeded a lot. It could be helpful.
>
> On Wed, May 9, 2018 at 11:26 AM, Patrick Hunt <phunt@apache.org> wrote:
>
> > On Wed, May 9, 2018 at 11:11 AM, Norbert Kalmar <nkalmar@cloudera.com>
> > wrote:
> >
> > > Thanks Patrick, great question.
> > > My understanding is that this tool not only shows if JVM spends too
> much
> > > time in GC, but if, for any other reason, there is a JVM pause (The
> tool
> > > only differentiates GC pause from all other pause). This could be slow
> > > fsync (although we do have logs for that) or even server/OS related.
> > >
> > > But again, this is just my interpretation. I will ask the source of the
> > > idea, what extra benefits this gives them over java GC log.
> > >
> > > I checked ZK, I don't see it enabled by default, but GC logging can be
> > set
> > > with JVM parameters easily, so that shouldn't be a key factor anyway.
> > >
> > >
> > I think that would be a useful change regardless - to make it on by
> default
> > I mean. Also some docs wrt our recommendations, how to troubleshoot,
> etc...
> > Adding a feature is useful, but ensuring people know about it and can
> use
> > it effectively is even more so.
> >
> > Regards,
> >
> > Patrick
> >
> >
> > > Regards,
> > > Norbert
> > >
> > > On Wed, May 9, 2018 at 7:57 PM Patrick Hunt <phunt@apache.org> wrote:
> > >
> > > > Do you know why they did this rather than just enabling GC logging by
> > > > default? Why re-invent the wheel?
> > > >
> > > > I seem to remember seeing a push do enable GC logging by default a
> few
> > > > years ago. In particular around the time when the JVM added GC log
> > > rolling
> > > > as a feature. Here's an example:
> > > >
> > > > https://batmat.net/2016/10/17/always-enable-gc-logs-and-how-
> > > to-enable-logs-rotation-with-hotspot/
> > > > My understanding is that the overhead is so low that it's feasible to
> > do
> > > > this.
> > > >
> > > > Good improvement though regardless which way we go.
> > > >
> > > > Regards,
> > > >
> > > > Patrick
> > > >
> > > > On Wed, May 9, 2018 at 9:36 AM, Andor Molnar <andor@cloudera.com>
> > wrote:
> > > >
> > > > > +1 cool!
> > > > >
> > > > >
> > > > > On Wed, May 9, 2018 at 7:59 AM, Norbert Kalmar <
> nkalmar@cloudera.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Okay, thanks Ed, I created the Jira, will look into it soon
:)
> > > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-3037
> > > > > >
> > > > > > Regards,
> > > > > > Norbert
> > > > > >
> > > > > > On Wed, May 9, 2018 at 4:44 PM Edward Ribeiro <
> > > > edward.ribeiro@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > +1. Sounds really nice to have feature. Let's open a ticket
and
> > > open
> > > > a
> > > > > > PR.
> > > > > > > :)
> > > > > > >
> > > > > > > Ed
> > > > > > >
> > > > > > > Em qua, 9 de mai de 2018 11:15, Norbert Kalmar <
> > > nkalmar@cloudera.com
> > > > >
> > > > > > > escreveu:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I just got a tip that we could improve on the logging
in
> > > ZooKeeper.
> > > > > > > After a
> > > > > > > > ZK crash, or client timeout sometimes it's hard to
determine
> > from
> > > > the
> > > > > > > logs
> > > > > > > > what happened. Knowing if ZK was responsive at the
time would
> > > help
> > > > a
> > > > > > lot.
> > > > > > > > For example, ZK might spend a lot of time waiting
on GC
> (there
> > is
> > > > > still
> > > > > > > > some misconception that ZK is a storage).
> > > > > > > >
> > > > > > > > To help detect this, HADOOP already has a great tool
called
> JVM
> > > > Pause
> > > > > > > > Monitor. (As the name suggest, it can be also used
for
> > > monitoring,
> > > > > but
> > > > > > it
> > > > > > > > also helps post-mortem in a lot of cases). Basically
it has a
> > > > daemon
> > > > > > that
> > > > > > > > sleeps for one second, and if the sleep time exceeds
the 1s
> by
> > > more
> > > > > > than
> > > > > > > > the threshold (1s: INFO, 10s: WARN by default - this
can be
> > > > > > configurable
> > > > > > > in
> > > > > > > > our case, see below), it will alert/make a log entry.
It can
> > also
> > > > > > monitor
> > > > > > > > the time GC took.
> > > > > > > >
> > > > > > > > Now, this class is in the HADOOP-common. I wouldn't
want to
> > > depend
> > > > on
> > > > > > > > Hadoop-common because of this one feature/class (it
is
> > actually a
> > > > > > single
> > > > > > > > class). Since this is a straightforward implementation,
and
> in
> > > the
> > > > > past
> > > > > > > > five years the few commits it had is nothing really
serious,
> I
> > > > think
> > > > > we
> > > > > > > > could just copy this class in ZooKeeper, and introduce
it as
> a
> > > > > > > configurable
> > > > > > > > feature, by default it can be off.
> > > > > > > >
> > > > > > > > The class:
> > > > > > > >
> > > > > > > >
> > > > > > > https://github.com/apache/hadoop/blob/trunk/hadoop-
> > > > > > common-project/hadoop-common/src/main/java/org/apache/
> > > > > > hadoop/util/JvmPauseMonitor.java
> > > > > > > >
> > > > > > > > What do You think?
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Norbert
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message