Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-dev@hadoop.apache.org
MIME-Version: 1.0
In-Reply-To: <6C09969C-A237-4AE4-949F-F5DE973C8E07@hortonworks.com>
References: 
 <CAEB75Khszdoo86SCyVHUGAM4ZAQ18d5VzbDO10FBawiUA0kw8Q@mail.gmail.com>
	<CAPbPdOYC79S5BkhUiHSBwtBiDKG8X2zELAQAG3WVkq2a++xV_w@mail.gmail.com>
	<CAEB75Kj68TS2nELmgOkyobDyagzS-Bm0Fij8tb3zJRz8_1MMZQ@mail.gmail.com>
	<CAGB5D2bVgVuDoF9bONbczQHWLi0VbV9kAw_RJmO_C=FM8uDgcg@mail.gmail.com>
	<14733720-752d-420a-ae60-a2d1922feb80.maildroid@localhost>
	<CAGB5D2YfADn0umyLNrYpO-t4_c88Y4r8QdFRem7pgjTe8SSJog@mail.gmail.com>
	<9A7D5663-0807-438C-82B2-2E441346AE9D@altiscale.com>
	<CAGB5D2a4aDfTNQZUEim-qva=QaMShx7-yfxUvQze21LwCz9F_Q@mail.gmail.com>
	<CAEB75Kh7bXVLgAMKq6rhTBhLKp3bKJxo=CvxbZR9TRJFmAeXoA@mail.gmail.com>
	<9F023CF2-33BB-4673-9550-09694C94585C@altiscale.com>
	<6C09969C-A237-4AE4-949F-F5DE973C8E07@hortonworks.com>
Date: Tue, 22 Sep 2015 08:39:18 -0700
Message-ID: 
 <CA+qbEUPUxO=ruxHGowDAVxHXtmUOEqa_Wpjgw6+h6v_TsUMueA@mail.gmail.com>
Subject: Re: Local repo sharing for maven builds
From: "Colin P. McCabe" <cmccabe@apache.org>
To: Hadoop Common <common-dev@hadoop.apache.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Mon, Sep 21, 2015 at 4:08 AM, Steve Loughran <stevel@hortonworks.com> wr=
ote:
>
>> On 19 Sep 2015, at 04:42, Allen Wittenauer <aw@altiscale.com> wrote:
>>
>> a) Multi-module patches are always troublesome because it makes the test=
 system do significantly more work.  For Yetus, we've pared it down as far =
as we can go to get *some* speed increases, but if a patch does something l=
ike hit every pom.xml file, there's nothing that can be done to make it bet=
ter other than splitting up the patch.
>>
>> b) It's worth noting that it happens more often to HDFS patches because =
HDFS unit tests take too damn long.  Some individual tests take 10 minutes!=
 They invariably collide with the various full builds (NOT pre commit! Thos=
e other things that Steve pointed out that we're ignoring).  While Yetus ha=
s support for running unit tests in parallel, Hadoop does not.
>
>
> I think the main thing I've been complaining about is how we ignore faili=
ng scheduled Jenkins runs; its been so unreliable that we all ignore the co=
nstant background noise of jenkins failures. That's compounded by how some =
test runs (hello Yarn-precommit!) send jenkins mails to the dev- list. (I'v=
e turned that off now: if you get jenkins failures on yarn-dev then its fro=
m the regular ones)

Yes, we need to get really repeatable builds.  It is a big problem
that we can't right now!

>
>>
>> c) mvn install is pretty much required for a not insignificant amount of=
 multi-module patches, esp if they hit hadoop-common.  For a large chunk of=
 "oh just make it one patch", it's effectively a death sentence on the Jenk=
ins side.
>
> The race conditions have existed for a long, long time. It only surfaces =
when you have a patch that spans artifacts which is one of: (1) incompatibl=
e across builds (2) needs to be synced across builds to work. If things sti=
ll linked up, you'd have the race *but you wouldn't notice*. It's only the =
artifact-spanning patches which surface.
>
> YARN has had this for a while, but it's builds are shorter, it's HDFS tha=
t's the problem for the reasons AW's noted
> -theres' now >1 JAR
> -it takes a long time to build and test, host conflict is inevitable.
>
>
> There is one tactic not yet looked at: every build to set a hadoop versio=
n, e.g instead of all precommits being hadoop-3.0.0-SNAPSHOT, they could be=
 hadoop-3.0.0-JIRA-4313-SNAPSHOT. No conflict, just the need to schedule a =
run that cleans up the m2 repo every night. If timestamped version numbers =
are used hadoop-3.0.0-2015-09-21-11:38 then the job can make better decisio=
ns about what to purge. Test runs could even rm their own artifacts after, =
perhaps.
>
> I think this would be the best way to isolate =E2=80=94no need for privat=
e repos, with the followon need to download the entire repo on every run, 1=
00% isolation.

Did anyone address Andrew's proposal to have one private repo per
Jenkins executor?  That seems like the simplest approach to me.  It
seems like that would only generate more network traffic in the case
where a dependency changes, which should be relatively rare.

It would be nice to combine this with Dockerization so that we can
finally stop worrying about rogue build machines that lack all the
dependencies, or chasing down infra whenever a new dependency is
added.

>
> The other issue with race conditions is port assignments, too much code w=
ith hard coded ports. =E2=80=94there's been slow work on that, with Brahma =
Reddy Battula deserving special mention here. But its almost a losing battl=
e, chasing where the next hard-coded port goes in, and again, leads to unre=
liable test runs that everyone ignores.
>
>
> ANNOUNCEMENT: new patches which contain hard-coded ports in test runs wil=
l henceforth be reverted. Jenkins matters more than the 30s of your time it=
 takes to use the free port finder methods. Same for any hard code paths in=
 filesystems.

+1.  Can you add this to HowToContribute on the wiki?  Or should we
vote on it first?

>
>
>>
>> d) I'm a big fan of d.
>>
>> e) File a bug against Yetus and we'll add the ability to set ant/gradle/=
maven args from the command line.  I thought I had it in there when I rewro=
te the support for multiple build tools, gradle, etc, but I clearly dropped=
 it on the floor.
>
> people won't do that. Switching to per-run hadoop version numbers should =
suffice for artifact dependencies, leaving only ports and paths.
>>
>> f) Any time you "give the option to the patch submitter", you generate a=
 not insignificant amount of work on the test infrastructure to determine i=
ntent because it effectively means implementing some parsing of a comment. =
 It's not particularly easy because humans rarely follow the rules.  Just s=
ee how well we are at following the Hadoop Compatibility Guidelines. Har ha=
r.  No really: people still struggle with filling in JIRA headers correctly=
 and naming patches to trigger the appropriate branch for the test.

I agree... we do not want a build option to provide what should be
basic functionality (doing the build correctly).

best,
Colin

>
> where's that documented BTW? I did try looking for it at the weekend..
>
>
>>
>> g) It's worth noting that Hadoop trunk is *not* using the latest test-pa=
tch code.  So there are some significant improvements on the way as soon as=
 we get a release out the door.
>>
>>
>
>
> well get on with it then :)
>
> I'm going to be at apachecon Data EU next week -who else will be. Maybe w=
e could make it a goal of the conference to come out of the week with jenki=
ns building reliably. I've been looking at it at weekends but don't have ti=
me in the week.
>
>