hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: Local repo sharing for maven builds
Date Mon, 21 Sep 2015 11:08:31 GMT

> On 19 Sep 2015, at 04:42, Allen Wittenauer <aw@altiscale.com> wrote:
> 
> a) Multi-module patches are always troublesome because it makes the test system do significantly
more work.  For Yetus, we've pared it down as far as we can go to get *some* speed increases,
but if a patch does something like hit every pom.xml file, there's nothing that can be done
to make it better other than splitting up the patch.
> 
> b) It's worth noting that it happens more often to HDFS patches because HDFS unit tests
take too damn long.  Some individual tests take 10 minutes! They invariably collide with the
various full builds (NOT pre commit! Those other things that Steve pointed out that we're
ignoring).  While Yetus has support for running unit tests in parallel, Hadoop does not. 



I think the main thing I've been complaining about is how we ignore failing scheduled Jenkins
runs; its been so unreliable that we all ignore the constant background noise of jenkins failures.
That's compounded by how some test runs (hello Yarn-precommit!) send jenkins mails to the
dev- list. (I've turned that off now: if you get jenkins failures on yarn-dev then its from
the regular ones)

> 
> c) mvn install is pretty much required for a not insignificant amount of multi-module
patches, esp if they hit hadoop-common.  For a large chunk of "oh just make it one patch",
it's effectively a death sentence on the Jenkins side.

The race conditions have existed for a long, long time. It only surfaces when you have a patch
that spans artifacts which is one of: (1) incompatible across builds (2) needs to be synced
across builds to work. If things still linked up, you'd have the race *but you wouldn't notice*.
It's only the artifact-spanning patches which surface.

YARN has had this for a while, but it's builds are shorter, it's HDFS that's the problem for
the reasons AW's noted
-theres' now >1 JAR
-it takes a long time to build and test, host conflict is inevitable.


There is one tactic not yet looked at: every build to set a hadoop version, e.g instead of
all precommits being hadoop-3.0.0-SNAPSHOT, they could be hadoop-3.0.0-JIRA-4313-SNAPSHOT.
No conflict, just the need to schedule a run that cleans up the m2 repo every night. If timestamped
version numbers are used hadoop-3.0.0-2015-09-21-11:38 then the job can make better decisions
about what to purge. Test runs could even rm their own artifacts after, perhaps.

I think this would be the best way to isolate —no need for private repos, with the followon
need to download the entire repo on every run, 100% isolation.

The other issue with race conditions is port assignments, too much code with hard coded ports.
—there's been slow work on that, with Brahma Reddy Battula deserving special mention here.
But its almost a losing battle, chasing where the next hard-coded port goes in, and again,
leads to unreliable test runs that everyone ignores.


ANNOUNCEMENT: new patches which contain hard-coded ports in test runs will henceforth be reverted.
Jenkins matters more than the 30s of your time it takes to use the free port finder methods.
Same for any hard code paths in filesystems.


> 
> d) I'm a big fan of d. 
> 
> e) File a bug against Yetus and we'll add the ability to set ant/gradle/maven args from
the command line.  I thought I had it in there when I rewrote the support for multiple build
tools, gradle, etc, but I clearly dropped it on the floor.

people won't do that. Switching to per-run hadoop version numbers should suffice for artifact
dependencies, leaving only ports and paths.
> 
> f) Any time you "give the option to the patch submitter", you generate a not insignificant
amount of work on the test infrastructure to determine intent because it effectively means
implementing some parsing of a comment.  It's not particularly easy because humans rarely
follow the rules.  Just see how well we are at following the Hadoop Compatibility Guidelines.
Har har.  No really: people still struggle with filling in JIRA headers correctly and naming
patches to trigger the appropriate branch for the test.

where's that documented BTW? I did try looking for it at the weekend..


> 
> g) It's worth noting that Hadoop trunk is *not* using the latest test-patch code.  So
there are some significant improvements on the way as soon as we get a release out the door.
> 
> 


well get on with it then :)

I'm going to be at apachecon Data EU next week -who else will be. Maybe we could make it a
goal of the conference to come out of the week with jenkins building reliably. I've been looking
at it at weekends but don't have time in the week.


Mime
View raw message