hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: On unit tests and hudson (WAS -> Re: Hudson build is back to normal : HBase-TRUNK #1551)
Date Fri, 15 Oct 2010 21:25:43 GMT
Agreed on all fronts.

Awesome work wrangling all these beasts, stack.

Maybe now we can actually expect a successful run of the entire test suite when adding new
stuff :)


Specifically on the ZK cleanup thing, it would seem to me there are two things broken.  Yes,
cleanup of ZK seems not right, but even still the next test should get the proper ZK port
to use rather than a stale one.  This seems to be an underlying issue between a number of
oddities in HC/HCM that I've been seeing with failover tests.

JG

> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> Stack
> Sent: Friday, October 15, 2010 10:47 AM
> To: dev@hbase.apache.org
> Subject: On unit tests and hudson (WAS -> Re: Hudson build is back to
> normal : HBase-TRUNK #1551)
> 
> Yeah, I'm fixing it (smile).
> 
> Tests are almost back to normal.  There's still some flakeyness to
> eradicate.  Almost there.
> 
> While there's a bit of a focus on tests, I'd like to petition that
> going forward we do all we can to keep tests in the blue.  Here's why
> (mostly informed by what I learned over the last week working with
> hudson):
> 
> + Hudson is always right.  If he fails a build, there is a cause.  The
> cause of failure may be indecipherable, seemingly from the realm of
> shadows, but digging will turn up the cause. Eventually.  Here's some
> recent 'interesting' illustration:
> ++ Our TableOutputFormat has been broke, probably since the day it was
> originally written more than a year (or two) ago in that it was not
> reading the config. set by job setup.   This plus a test that was
> leaving up a zookeeper ensemble -- yet to be found -- was root cause
> of sporadic TestTableMapReduce failings.
> ++ Clients could always timeout their session on zookeeper especially
> when the zk ensemble was restarted as part of a unit test
> (TestClusterRestart).  A timed-out client hosts stale data; i.e. its
> not updatable by watchers.  Up until the new master commit, these
> session expirations were rarely troublesome; the stale data was
> usually sufficient to complete the test successfully.  Failures were
> rare but possible (With new master, there's more riding on zk watchers
> working so lost session should be more obvious).
> 
> + We can't let broke tests go unaddressed again.  If tests strike up a
> failing pattern in hudson we all get lazy about running tests at all.
> We lose the benefit unit tests bring where unit tests turn up the side
> effects not considered.  While the new master checkin was responsible
> for a portion of the failures of late, what has been interesting to me
> is how many of the recent test fails were not related at all.  There
> were tests that tested nought and failed (i.e. the putting up of two
> HBaseTestingUtilities in the one JVM but this doesn't work yet so test
> would hang on close), tests that were working under presumptions long
> since abandoned (TestMergeMeta wanted to do exactly that, merge meta,
> a facility we frustrated ad while back), or tests that had been broken
> by a refactoring unrelated to new master (TestSplitTransaction had a
> means of distinguishing testing from normal running that was broke).
> 
> + A good few tests -- maybe 5 in the end -- were not completing when
> the test suite was run and maven would step in and kill them.  These
> tests prevented the tests behind them from running.  A few of these
> were checked in tests that could never have worked.  For example,
> TestDeadServers, a test I committed, was plain broke.  It could never
> have worked.  It looks like I checked in a version that was
> incomplete.  Or TestLoadBalancer was using an unresolvable hostname.
> How could that ever have worked?  Because of the tests that were not
> completing, hudson did not have a chance to flag the broke commits.
> I've changed the timeout on tests so we'll cut in after 15 minutes.
> Its also good practice to add the junit4 (timeout = N) qualification
> to the @Test annotation.  Set it to 3 or 5 minutes or something.
> Unfortunately, a bunch of our tests are still junit3 and the timeout
> is not an option (that I know of).
> 
> St.Ack
> P.S. I still love unit tests even when they are a pain.
> 
> 
> 
> On Fri, Oct 15, 2010 at 9:41 AM, Steven Noels
> <stevenn@outerthought.org> wrote:
> > This must be a mistake. |-)
> >
> > On Fri, Oct 15, 2010 at 9:52 AM, Apache Hudson Server <
> > hudson@hudson.apache.org> wrote:
> >
> >> See <https://hudson.apache.org/hudson/job/HBase-TRUNK/1551/changes>
> >>
> >>
> >>
> >
> >
> > --
> > Steven Noels
> > http://outerthought.org/
> > Open Source Content Applications
> > Makers of Kauri, Daisy CMS and Lily
> >

Mime
View raw message