Return-Path: X-Original-To: apmail-hadoop-common-dev-archive@www.apache.org Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AB0221790C for ; Sun, 7 Jun 2015 03:40:23 +0000 (UTC) Received: (qmail 62721 invoked by uid 500); 7 Jun 2015 03:40:20 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 62651 invoked by uid 500); 7 Jun 2015 03:40:20 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 62152 invoked by uid 99); 7 Jun 2015 03:40:20 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Jun 2015 03:40:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id F1B3FCC135 for ; Sun, 7 Jun 2015 03:40:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.001 X-Spam-Level: *** X-Spam-Status: No, score=3.001 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id OQaeFMZAAlUX for ; Sun, 7 Jun 2015 03:40:07 +0000 (UTC) Received: from mail-la0-f50.google.com (mail-la0-f50.google.com [209.85.215.50]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id D9E1542996 for ; Sun, 7 Jun 2015 03:40:06 +0000 (UTC) Received: by labpy14 with SMTP id py14so76513584lab.0 for ; Sat, 06 Jun 2015 20:40:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=WouvFhrfU3PgdgKplixw+EU20s1GCylnsK29tDIjTsM=; b=VJX87GqNuf+vnBgBEUwnv7jr6ZPJ/gIF4mGxn9BnLnrdPajIHaFu/ZNWT0rkpbpIHU s3uAkNROvtHA0mmJJlAMv6030GTeCjkpm1K3JBGdLcuDzBAF6zFuZKtQwb1Rfjbs6O4m nXC1JdlD/rv24kEy2YYwMCVtq3npsGWegrwgWWU/ECARR0rtFTDujg9oVUkF2LI6X1k4 nr2kQm2vBuoGzwuF+y3gSuDb4mvYmCaQRPqKCYMhlebNhnv8ehbSLI/iLEqnvB93CoeF aNPCnHZOxGhDQfU50YM9v8Ao4Fs5yGeKM+MQ8HRlDLauT3KEWFfDa8PQmQjB8if7iDCM EhtQ== X-Gm-Message-State: ALoCoQlNG98OEHyzAcecZysB0vm8G8ncme7nAJOdRy9f4LWnhF9Nf7zOgNE2Vi54lXkCUXaxi+yy X-Received: by 10.112.213.108 with SMTP id nr12mr10365288lbc.42.1433648405716; Sat, 06 Jun 2015 20:40:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.197.67 with HTTP; Sat, 6 Jun 2015 20:39:45 -0700 (PDT) In-Reply-To: References: From: Sean Busbey Date: Sat, 6 Jun 2015 22:39:45 -0500 Message-ID: Subject: Re: upstream jenkins build broken? To: dev , Hadoop Common , "hdfs-dev@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a11345cc0f4b6570517e547d8 --001a11345cc0f4b6570517e547d8 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Folks! After working on test-patch with other folks for the last few months, I think we've reached the point where we can make the fastest progress towards the goal of a general use pre-commit patch tester by spinning things into a project focused on just that. I think we have a mature enough code base and a sufficient fledgling community, so I'm going to put together a tlp proposal. Thanks for the feedback thus far from use within Hadoop. I hope we can continue to make things more useful. -Sean On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey wrote: > HBase's dev-support folder is where the scripts and support files live. > We've only recently started adding anything to the maven builds that's > specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'= d > add in more if we ran into the same permissions problems y'all are having= . > > There's also our precommit job itself, though it isn't large[2]. AFAIK, w= e > don't properly back this up anywhere, we just notify each other of change= s > on a particular mail thread[3]. > > [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687 > [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all > read because I just finished fixing "mvn site" running out of permgen) > [3]: http://s.apache.org/NT0 > > > On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth > wrote: > >> Sure, thanks Sean! Do we just look in the dev-support folder in the HBa= se >> repo? Is there any additional context we need to be aware of? >> >> Chris Nauroth >> Hortonworks >> http://hortonworks.com/ >> >> >> >> >> >> >> On 3/11/15, 2:44 PM, "Sean Busbey" wrote: >> >> >+dev@hbase >> > >> >HBase has recently been cleaning up our precommit jenkins jobs to make >> >them >> >more robust. From what I can tell our stuff started off as an earlier >> >version of what Hadoop uses for testing. >> > >> >Folks on either side open to an experiment of combining our precommit >> >check >> >tooling? In principle we should be looking for the same kinds of things= . >> > >> >Naturally we'll still need different jenkins jobs to handle different >> >resource needs and we'd need to figure out where stuff eventually lives= , >> >but that could come later. >> > >> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth > > >> >wrote: >> > >> >> The only thing I'm aware of is the failOnError option: >> >> >> >> >> >> >> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-err= o >> >>rs >> >> .html >> >> >> >> >> >> I prefer that we don't disable this, because ignoring different kinds >> of >> >> failures could leave our build directories in an indeterminate state. >> >>For >> >> example, we could end up with an old class file on the classpath for >> >>test >> >> runs that was supposedly deleted. >> >> >> >> I think it's worth exploring Eddy's suggestion to try simulating >> failure >> >> by placing a file where the code expects to see a directory. That >> might >> >> even let us enable some of these tests that are skipped on Windows, >> >> because Windows allows access for the owner even after permissions ha= ve >> >> been stripped. >> >> >> >> Chris Nauroth >> >> Hortonworks >> >> http://hortonworks.com/ >> >> >> >> >> >> >> >> >> >> >> >> >> >> On 3/11/15, 2:10 PM, "Colin McCabe" wrote: >> >> >> >> >Is there a maven plugin or setting we can use to simply remove >> >> >directories that have no executable permissions on them? Clearly we >> >> >have the permission to do this from a technical point of view (since >> >> >we created the directories as the jenkins user), it's simply that th= e >> >> >code refuses to do it. >> >> > >> >> >Otherwise I guess we can just fix those tests... >> >> > >> >> >Colin >> >> > >> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu wrote: >> >> >> Thanks a lot for looking into HDFS-7722, Chris. >> >> >> >> >> >> In HDFS-7722: >> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in >> >> >>TearDown(). >> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause. >> >> >> >> >> >> Also I ran mvn test several times on my machine and all tests >> passed. >> >> >> >> >> >> However, since in DiskChecker#checkDirAccess(): >> >> >> >> >> >> private static void checkDirAccess(File dir) throws >> >>DiskErrorException { >> >> >> if (!dir.isDirectory()) { >> >> >> throw new DiskErrorException("Not a directory: " >> >> >> + dir.toString()); >> >> >> } >> >> >> >> >> >> checkAccessByFileMethods(dir); >> >> >> } >> >> >> >> >> >> One potentially safer alternative is replacing data dir with a >> >>regular >> >> >> file to stimulate disk failures. >> >> >> >> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth >> >> >> wrote: >> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure, >> >> >>> TestDataNodeVolumeFailureReporting, and >> >> >>> TestDataNodeVolumeFailureToleration all remove executable >> >>permissions >> >> >>>from >> >> >>> directories like the one Colin mentioned to simulate disk failure= s >> >>at >> >> >>>data >> >> >>> nodes. I reviewed the code for all of those, and they all appear >> >>to be >> >> >>> doing the necessary work to restore executable permissions at the >> >>end >> >> >>>of >> >> >>> the test. The only recent uncommitted patch I=C2=B9ve seen that = makes >> >> >>>changes >> >> >>> in these test suites is HDFS-7722. That patch still looks fine >> >> >>>though. I >> >> >>> don=C2=B9t know if there are other uncommitted patches that chang= ed >> these >> >> >>>test >> >> >>> suites. >> >> >>> >> >> >>> I suppose it=C2=B9s also possible that the JUnit process unexpect= edly >> >>died >> >> >>> after removing executable permissions but before restoring them. >> >>That >> >> >>> always would have been a weakness of these test suites, regardles= s >> >>of >> >> >>>any >> >> >>> recent changes. >> >> >>> >> >> >>> Chris Nauroth >> >> >>> Hortonworks >> >> >>> http://hortonworks.com/ >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" wrote: >> >> >>> >> >> >>>>Hey Colin, >> >> >>>> >> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going o= n >> >>with >> >> >>>>these boxes. He took a look and concluded that some perms are bei= ng >> >> >>>>set in >> >> >>>>those directories by our unit tests which are precluding those >> files >> >> >>>>from >> >> >>>>getting deleted. He's going to clean up the boxes for us, but we >> >>should >> >> >>>>expect this to keep happening until we can fix the test in questi= on >> >>to >> >> >>>>properly clean up after itself. >> >> >>>> >> >> >>>>To help narrow down which commit it was that started this, Andrew >> >>sent >> >> >>>>me >> >> >>>>this info: >> >> >>>> >> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS- >> >> >> >> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/da= ta3 >> >>>>>>/ >> >> >>>>has >> >> >>>>500 perms, so I'm guessing that's the problem. Been that way sinc= e >> >>9:32 >> >> >>>>UTC >> >> >>>>on March 5th." >> >> >>>> >> >> >>>>-- >> >> >>>>Aaron T. Myers >> >> >>>>Software Engineer, Cloudera >> >> >>>> >> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe >> >> >> >> >>>>wrote: >> >> >>>> >> >> >>>>> Hi all, >> >> >>>>> >> >> >>>>> A very quick (and not thorough) survey shows that I can't find >> any >> >> >>>>> jenkins jobs that succeeded from the last 24 hours. Most of th= em >> >> >>>>>seem >> >> >>>>> to be failing with some variant of this message: >> >> >>>>> >> >> >>>>> [ERROR] Failed to execute goal >> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean >> >>(default-clean) >> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed to dele= te >> >> >>>>> >> >> >>>>> >> >> >> >> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop= -hd >> >>>>>>>fs >> >> >>>>>-pr >> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3 >> >> >>>>> -> [Help 1] >> >> >>>>> >> >> >>>>> Any ideas how this happened? Bad disk, unit test setting wrong >> >> >>>>> permissions? >> >> >>>>> >> >> >>>>> Colin >> >> >>>>> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Lei (Eddy) Xu >> >> >> Software Engineer, Cloudera >> >> >> >> >> > >> > >> >-- >> >Sean >> >> > > > -- > Sean > --=20 Sean --001a11345cc0f4b6570517e547d8--