Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-dev@hadoop.apache.org
MIME-Version: 1.0
In-Reply-To: <D10A1054.1BE90%cnauroth@hortonworks.com>
References: <437381758.227501.1423158571738.JavaMail.yahoo@mail.yahoo.com>
	<CAA0W1bSrZNK-eN=qKtvcG0yHPV_DF4JXTzey=9EXsiu3wLLuPA@mail.gmail.com>
	<CA+qbEUOWJ1ERfq2LA-hx7OD-kqz36wpZB1Afbu_3seA0t+n+Tg@mail.gmail.com>
	<etPan.54d9d9ca.542289ec.103a6@stevel.local>
	<CA+qbEUMNwAREyA6=D_BXWC2pt0t-MUTRCgqaL+5doGDPjUqZ2A@mail.gmail.com>
	<D10A1054.1BE90%cnauroth@hortonworks.com>
Date: Wed, 18 Feb 2015 23:14:25 -0800
Message-ID: 
 <CA+qbEUMSJckkXcWNo81v+KynyNE1FPySuVihY9j=R+4TSyyHcQ@mail.gmail.com>
Subject: Re: Erratic Jenkins behavior
From: "Colin P. McCabe" <cmccabe@apache.org>
To: "hdfs-dev@hadoop.apache.org" <hdfs-dev@hadoop.apache.org>
Cc: "yarn-dev@hadoop.apache.org" <yarn-dev@hadoop.apache.org>,
	Hadoop Common <common-dev@hadoop.apache.org>,
 "Colin P. McCabe" <cmccabe@apache.org>,
	Kihwal Lee <kihwal@yahoo-inc.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hmm.  I guess my thought would be that we would have a fixed number of
"slots" (i.e. executors on a single node with associated .m2
directories).  Then we wouldn't clear each .m2 in between runs, but we
would ensure that only one slot at a time had access to each
directory.

In that case, build times wouldn't increase that much (or really at
all, until a dependency changed... right?).  When a dependency changed
we'd have to do O(N_slots) amount of work, but dependencies don't
change that often.

Of course, the current situation also generates a lot of extra work
because people need to rekick builds that failed for mystery reasons.

cheers.
Colin

On Wed, Feb 18, 2015 at 9:53 AM, Chris Nauroth <cnauroth@hortonworks.com> w=
rote:
> I=C2=B9m pretty sure there is no guarantee of isolation on a shared
> .m2/repository directory for multiple concurrent Maven processes.  I=C2=
=B9ve
> had a theory for a while that one build running =C2=B3mvm install=C2=B2 c=
an
> overwrite the snapshot artifact that was just installed by another
> concurrent build.  This can create bizarre problems, for example if a
> patch introduces a new class in hadoop-common and then references that
> class from hadoop-hdfs.
>
> I expect using completely separate work directories for .m2/repository,
> the patch directory, and the Jenkins workspace could resolve this.  The
> typical cost for this kind of change is increased disk consumption and
> increased build time, since Maven would need to download dependencies
> fresh every time.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 2/12/15, 2:00 PM, "Colin P. McCabe" <cmccabe@apache.org> wrote:
>
>>We could potentially use different .m2 directories for each executor.
>>I think this has been brought up in the past as well.
>>
>>I'm not sure how maven handles concurrent access to the .m2
>>directory... if it's not using flock or fnctl then it's not really
>>safe.  This might explain some of our missing class error issues.
>>
>>Colin
>>
>>On Tue, Feb 10, 2015 at 2:13 AM, Steve Loughran <stevel@hortonworks.com>
>>wrote:
>>> Mvn is a dark mystery to us all. I wouldn't trust it not pick up things
>>>from other builds if they ended up published to ~/.m2/repository during
>>>the process
>>>
>>>
>>>
>>> On 9 February 2015 at 19:29:06, Colin P. McCabe
>>>(cmccabe@apache.org<mailto:cmccabe@apache.org>) wrote:
>>>
>>> I'm sorry, I don't have any insight into this. With regard to
>>> HADOOP-11084, I thought that $BUILD_URL would be unique for each
>>> concurrent build, which would prevent build artifacts from getting
>>> mixed up between jobs. Based on the value of PATCHPROCESS that Kihwal
>>> posted, perhaps this is not the case? Perhaps someone can explain how
>>> this is supposed to work (I am a Jenkins newbie).
>>>
>>> regards,
>>> Colin
>>>
>>> On Thu, Feb 5, 2015 at 10:42 AM, Yongjun Zhang <yzhang@cloudera.com>
>>>wrote:
>>>> Thanks Kihwal for bringing this up.
>>>>
>>>> Seems related to:
>>>>
>>>> https://issues.apache.org/jira/browse/HADOOP-11084
>>>>
>>>> Hi Andrew/Arpit/Colin/Steve, you guys worked on this jira before, any
>>>> insight about the issue Kihwal described?
>>>>
>>>> Thanks.
>>>>
>>>> --Yongjun
>>>>
>>>>
>>>> On Thu, Feb 5, 2015 at 9:49 AM, Kihwal Lee
>>>><kihwal@yahoo-inc.com.invalid>
>>>> wrote:
>>>>
>>>>> I am sure many of us have seen strange jenkins behavior out of the
>>>>> precommit builds.
>>>>>
>>>>> - build artifacts missing
>>>>> - serving build artifact belonging to another build. This also causes
>>>>> wrong precommit results to be posted on the bug.
>>>>> - etc.
>>>>>
>>>>> The latest one I saw is disappearance of the unit test stdout/stderr
>>>>>file
>>>>> during a build. After a successful run of unit tests, the file
>>>>>vanished, so
>>>>> the script could not cat it. It looked like another build process had
>>>>> deleted it, while this build was in progress.
>>>>>
>>>>> It might have something to do with the fact that the patch-dir is set
>>>>>like
>>>>> following:
>>>>>
>>>>>
>>>>>PATCHPROCESS=3D/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Bu=
ild
>>>>>/../patchprocessI
>>>>> don't have access to the jenkins build configs or the build machines,
>>>>>so I
>>>>> can't debug it further, but I think we need to take care of it sooner
>>>>>than
>>>>> later. Can any one help?
>>>>>
>>>>> Kihwal
>>>>>
>