Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3482817E87 for ; Thu, 19 Feb 2015 07:14:31 +0000 (UTC) Received: (qmail 5299 invoked by uid 500); 19 Feb 2015 07:14:27 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 5091 invoked by uid 500); 19 Feb 2015 07:14:27 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 5060 invoked by uid 99); 19 Feb 2015 07:14:27 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Feb 2015 07:14:27 +0000 Received: from mail-wg0-f54.google.com (mail-wg0-f54.google.com [74.125.82.54]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 589541A044D; Thu, 19 Feb 2015 07:14:27 +0000 (UTC) Received: by mail-wg0-f54.google.com with SMTP id y19so5361026wgg.13; Wed, 18 Feb 2015 23:14:25 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.194.59.232 with SMTP id c8mr6513298wjr.76.1424330065536; Wed, 18 Feb 2015 23:14:25 -0800 (PST) Received: by 10.194.107.130 with HTTP; Wed, 18 Feb 2015 23:14:25 -0800 (PST) In-Reply-To: References: <437381758.227501.1423158571738.JavaMail.yahoo@mail.yahoo.com> Date: Wed, 18 Feb 2015 23:14:25 -0800 Message-ID: Subject: Re: Erratic Jenkins behavior From: "Colin P. McCabe" To: "hdfs-dev@hadoop.apache.org" Cc: "yarn-dev@hadoop.apache.org" , Hadoop Common , "Colin P. McCabe" , Kihwal Lee Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hmm. I guess my thought would be that we would have a fixed number of "slots" (i.e. executors on a single node with associated .m2 directories). Then we wouldn't clear each .m2 in between runs, but we would ensure that only one slot at a time had access to each directory. In that case, build times wouldn't increase that much (or really at all, until a dependency changed... right?). When a dependency changed we'd have to do O(N_slots) amount of work, but dependencies don't change that often. Of course, the current situation also generates a lot of extra work because people need to rekick builds that failed for mystery reasons. cheers. Colin On Wed, Feb 18, 2015 at 9:53 AM, Chris Nauroth w= rote: > I=C2=B9m pretty sure there is no guarantee of isolation on a shared > .m2/repository directory for multiple concurrent Maven processes. I=C2= =B9ve > had a theory for a while that one build running =C2=B3mvm install=C2=B2 c= an > overwrite the snapshot artifact that was just installed by another > concurrent build. This can create bizarre problems, for example if a > patch introduces a new class in hadoop-common and then references that > class from hadoop-hdfs. > > I expect using completely separate work directories for .m2/repository, > the patch directory, and the Jenkins workspace could resolve this. The > typical cost for this kind of change is increased disk consumption and > increased build time, since Maven would need to download dependencies > fresh every time. > > Chris Nauroth > Hortonworks > http://hortonworks.com/ > > > > > > > On 2/12/15, 2:00 PM, "Colin P. McCabe" wrote: > >>We could potentially use different .m2 directories for each executor. >>I think this has been brought up in the past as well. >> >>I'm not sure how maven handles concurrent access to the .m2 >>directory... if it's not using flock or fnctl then it's not really >>safe. This might explain some of our missing class error issues. >> >>Colin >> >>On Tue, Feb 10, 2015 at 2:13 AM, Steve Loughran >>wrote: >>> Mvn is a dark mystery to us all. I wouldn't trust it not pick up things >>>from other builds if they ended up published to ~/.m2/repository during >>>the process >>> >>> >>> >>> On 9 February 2015 at 19:29:06, Colin P. McCabe >>>(cmccabe@apache.org) wrote: >>> >>> I'm sorry, I don't have any insight into this. With regard to >>> HADOOP-11084, I thought that $BUILD_URL would be unique for each >>> concurrent build, which would prevent build artifacts from getting >>> mixed up between jobs. Based on the value of PATCHPROCESS that Kihwal >>> posted, perhaps this is not the case? Perhaps someone can explain how >>> this is supposed to work (I am a Jenkins newbie). >>> >>> regards, >>> Colin >>> >>> On Thu, Feb 5, 2015 at 10:42 AM, Yongjun Zhang >>>wrote: >>>> Thanks Kihwal for bringing this up. >>>> >>>> Seems related to: >>>> >>>> https://issues.apache.org/jira/browse/HADOOP-11084 >>>> >>>> Hi Andrew/Arpit/Colin/Steve, you guys worked on this jira before, any >>>> insight about the issue Kihwal described? >>>> >>>> Thanks. >>>> >>>> --Yongjun >>>> >>>> >>>> On Thu, Feb 5, 2015 at 9:49 AM, Kihwal Lee >>>> >>>> wrote: >>>> >>>>> I am sure many of us have seen strange jenkins behavior out of the >>>>> precommit builds. >>>>> >>>>> - build artifacts missing >>>>> - serving build artifact belonging to another build. This also causes >>>>> wrong precommit results to be posted on the bug. >>>>> - etc. >>>>> >>>>> The latest one I saw is disappearance of the unit test stdout/stderr >>>>>file >>>>> during a build. After a successful run of unit tests, the file >>>>>vanished, so >>>>> the script could not cat it. It looked like another build process had >>>>> deleted it, while this build was in progress. >>>>> >>>>> It might have something to do with the fact that the patch-dir is set >>>>>like >>>>> following: >>>>> >>>>> >>>>>PATCHPROCESS=3D/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Bu= ild >>>>>/../patchprocessI >>>>> don't have access to the jenkins build configs or the build machines, >>>>>so I >>>>> can't debug it further, but I think we need to take care of it sooner >>>>>than >>>>> later. Can any one help? >>>>> >>>>> Kihwal >>>>> >