zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Interesting elastic/ZK post
Date Mon, 09 May 2016 17:18:54 GMT
Makes sense to me to add it. Someone could create a ZK jira? Sounds like a
great starter project for someone interested to get rolling with ZK.  3.5+
adds jetty support for accessing metrics, sounds like it would dovetail
nicely.

Patrick

On Mon, May 9, 2016 at 10:12 AM, Chris Nauroth <cnauroth@hortonworks.com>
wrote:

> I always sympathize with a major outage report, but on the bright side, it
> was very satisfying to hear the ZooKeeper cluster had sustained uptime for
> 3 years.  That agrees with my own user experience.  It's often the most
> stable component of a distributed infrastructure (as it needs to be).
>
> As far as potential improvements, I was wondering if it would make sense
> to introduce something like Hadoop's JvmPauseMonitor [1].  This is a
> background thread that attempts to detect GC churn and log warnings about
> it.  This has been very helpful in diagnosing NameNode misconfigurations
> that lead to GC churn.
>
> This wouldn't have prevented a problem for the Elastic Cloud team, but at
> least it would have made the root cause more visible.  A warning about GC
> churn could have been shown in the main ZooKeeper log instead of a
> separate GC log or inferring it from other sources like JMX.
>
> [1] https://s.apache.org/4sdx
>
> --Chris Nauroth
>
>
>
>
> On 5/8/16, 7:37 PM, "Patrick Hunt" <phunt@apache.org> wrote:
>
> >Interesting root cause and mitigations discussion.
> >
> >https://www.elastic.co/blog/elastic-cloud-outage-april-2016
> >
> >Patrick
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message