incubator-bigtop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: adding HA monitoring to bigtop
Date Thu, 11 Oct 2012 08:34:54 GMT
On 11 October 2012 00:13, Konstantin Boudnik <> wrote:

> Steve,
> great stuff. Here's my initial feedback:
> 1. I am not passing judgement about how the monitoring is done, although
> something like Nagios would fill the bill good enough, IMO.

Nagios can over-react and 4000 emails out when a service isn't responding,
without noting that it's down for another reason -and it's biased towards
those email notifications.

> Anyway... It seems
> like this monitoring is very Hadoop HA specific,

It could actually monitor any service with one or more of

The Hadoop-ness currently comes from
 1. specific probes for HDFS and JT
 2. use of hadoop XML config for settings (trivial fix)
 3. -probe order fixed in source
 4. no current support for adding new probes just by putting them on the
classpath and declaring them
 5. an installation that goes under /usr/lib/hadoop and picks up the hadoop
classpath and native lib so its hadoop probes are always in sync with the

I'd fix 2 & 3 by having a better config language that lets you specify an
order of operations

> I would say that it is better
> be kept in Hadoop in one form or another - hadoop/contrib seems like a good
> place to start, In other words, I don't think this is generic enough
> monitoring software to be included into the BigTop.


> Say, I'd be happy to
> include Ganglia or some Nagios hooks for the same purposes.  Packaging for
> this monitoring software can be of course added to the BigTop stack like we
> are doing this for many other components - it looks very reasonable
> approach.
> 2. The failure inducing library seems like a great addition to the iTest.
> In
> fact, if I were doing Hadoop fault injection again I would certainly go
> with
> MOP'ping and Groovy-based framework, instead of AspectJ boredom. So, I like
> the idea and it seems to fit very well with the original design ideas of
> the
> iTest - let's add the library to the BigTop. There things to look at and
> discuss of course but I like the overall idea!

OK, -this bit of it is v. immature and might ultimately go into its own
module, so that hadoop HA tests can use it too

> To summarize: I'm rather negative on keeping the monitoring software as a
> part
> of the BigTop; and I am quite positive on bring the testing lib as a part
> of
> the iTest.

I'll have a look at iTest and see where it fits in, then we can start
thinking about what a good test framework for triggering infrastructures
would be. I think what I've got is just a starting point. FWIW jclouds is
looking at vbox integration too, via its Web Service API -it could be used
to trigger VM death in any virtual infrastructure, we'd just need to add
back ends for physical infrastructures (for now: dialog boxes & fencing
scripts), and the code to cause trouble inside the VM itself.

BTW, one thing you can do with virtual infrastructure is forced volume
unmounts, umount -f, which could be used to simulate disk, disk controller
or disk driver problems. Something like that would be really good for
generating stress tests of all the storage layers.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message