hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@cloudera.com>
Subject Re: upstream jenkins build broken?
Date Sun, 07 Jun 2015 03:39:45 GMT
Hi Folks!

After working on test-patch with other folks for the last few months, I
think we've reached the point where we can make the fastest progress
towards the goal of a general use pre-commit patch tester by spinning
things into a project focused on just that. I think we have a mature enough
code base and a sufficient fledgling community, so I'm going to put
together a tlp proposal.

Thanks for the feedback thus far from use within Hadoop. I hope we can
continue to make things more useful.

-Sean

On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <busbey@cloudera.com> wrote:

> HBase's dev-support folder is where the scripts and support files live.
> We've only recently started adding anything to the maven builds that's
> specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
> add in more if we ran into the same permissions problems y'all are having.
>
> There's also our precommit job itself, though it isn't large[2]. AFAIK, we
> don't properly back this up anywhere, we just notify each other of changes
> on a particular mail thread[3].
>
> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
> read because I just finished fixing "mvn site" running out of permgen)
> [3]: http://s.apache.org/NT0
>
>
> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cnauroth@hortonworks.com>
> wrote:
>
>> Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
>> repo?  Is there any additional context we need to be aware of?
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>>
>>
>>
>> On 3/11/15, 2:44 PM, "Sean Busbey" <busbey@cloudera.com> wrote:
>>
>> >+dev@hbase
>> >
>> >HBase has recently been cleaning up our precommit jenkins jobs to make
>> >them
>> >more robust. From what I can tell our stuff started off as an earlier
>> >version of what Hadoop uses for testing.
>> >
>> >Folks on either side open to an experiment of combining our precommit
>> >check
>> >tooling? In principle we should be looking for the same kinds of things.
>> >
>> >Naturally we'll still need different jenkins jobs to handle different
>> >resource needs and we'd need to figure out where stuff eventually lives,
>> >but that could come later.
>> >
>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cnauroth@hortonworks.com
>> >
>> >wrote:
>> >
>> >> The only thing I'm aware of is the failOnError option:
>> >>
>> >>
>> >>
>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>> >>rs
>> >> .html
>> >>
>> >>
>> >> I prefer that we don't disable this, because ignoring different kinds
>> of
>> >> failures could leave our build directories in an indeterminate state.
>> >>For
>> >> example, we could end up with an old class file on the classpath for
>> >>test
>> >> runs that was supposedly deleted.
>> >>
>> >> I think it's worth exploring Eddy's suggestion to try simulating
>> failure
>> >> by placing a file where the code expects to see a directory.  That
>> might
>> >> even let us enable some of these tests that are skipped on Windows,
>> >> because Windows allows access for the owner even after permissions have
>> >> been stripped.
>> >>
>> >> Chris Nauroth
>> >> Hortonworks
>> >> http://hortonworks.com/
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu> wrote:
>> >>
>> >> >Is there a maven plugin or setting we can use to simply remove
>> >> >directories that have no executable permissions on them?  Clearly we
>> >> >have the permission to do this from a technical point of view (since
>> >> >we created the directories as the jenkins user), it's simply that the
>> >> >code refuses to do it.
>> >> >
>> >> >Otherwise I guess we can just fix those tests...
>> >> >
>> >> >Colin
>> >> >
>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com> wrote:
>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>> >> >>
>> >> >> In HDFS-7722:
>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> >> >>TearDown().
>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> >> >>
>> >> >> Also I ran mvn test several times on my machine and all tests
>> passed.
>> >> >>
>> >> >> However, since in DiskChecker#checkDirAccess():
>> >> >>
>> >> >> private static void checkDirAccess(File dir) throws
>> >>DiskErrorException {
>> >> >>   if (!dir.isDirectory()) {
>> >> >>     throw new DiskErrorException("Not a directory: "
>> >> >>                                  + dir.toString());
>> >> >>   }
>> >> >>
>> >> >>   checkAccessByFileMethods(dir);
>> >> >> }
>> >> >>
>> >> >> One potentially safer alternative is replacing data dir with a
>> >>regular
>> >> >> file to stimulate disk failures.
>> >> >>
>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>> >> >><cnauroth@hortonworks.com> wrote:
>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> >> >>> TestDataNodeVolumeFailureReporting, and
>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
>> >>permissions
>> >> >>>from
>> >> >>> directories like the one Colin mentioned to simulate disk failures
>> >>at
>> >> >>>data
>> >> >>> nodes.  I reviewed the code for all of those, and they all
appear
>> >>to be
>> >> >>> doing the necessary work to restore executable permissions
at the
>> >>end
>> >> >>>of
>> >> >>> the test.  The only recent uncommitted patch I¹ve seen that
makes
>> >> >>>changes
>> >> >>> in these test suites is HDFS-7722.  That patch still looks
fine
>> >> >>>though.  I
>> >> >>> don¹t know if there are other uncommitted patches that changed
>> these
>> >> >>>test
>> >> >>> suites.
>> >> >>>
>> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
>> >>died
>> >> >>> after removing executable permissions but before restoring
them.
>> >>That
>> >> >>> always would have been a weakness of these test suites, regardless
>> >>of
>> >> >>>any
>> >> >>> recent changes.
>> >> >>>
>> >> >>> Chris Nauroth
>> >> >>> Hortonworks
>> >> >>> http://hortonworks.com/
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com>
wrote:
>> >> >>>
>> >> >>>>Hey Colin,
>> >> >>>>
>> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's
going on
>> >>with
>> >> >>>>these boxes. He took a look and concluded that some perms
are being
>> >> >>>>set in
>> >> >>>>those directories by our unit tests which are precluding
those
>> files
>> >> >>>>from
>> >> >>>>getting deleted. He's going to clean up the boxes for us,
but we
>> >>should
>> >> >>>>expect this to keep happening until we can fix the test
in question
>> >>to
>> >> >>>>properly clean up after itself.
>> >> >>>>
>> >> >>>>To help narrow down which commit it was that started this,
Andrew
>> >>sent
>> >> >>>>me
>> >> >>>>this info:
>> >> >>>>
>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> >>
>>
>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>> >>>>>>/
>> >> >>>>has
>> >> >>>>500 perms, so I'm guessing that's the problem. Been that
way since
>> >>9:32
>> >> >>>>UTC
>> >> >>>>on March 5th."
>> >> >>>>
>> >> >>>>--
>> >> >>>>Aaron T. Myers
>> >> >>>>Software Engineer, Cloudera
>> >> >>>>
>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>> >><cmccabe@apache.org>
>> >> >>>>wrote:
>> >> >>>>
>> >> >>>>> Hi all,
>> >> >>>>>
>> >> >>>>> A very quick (and not thorough) survey shows that I
can't find
>> any
>> >> >>>>> jenkins jobs that succeeded from the last 24 hours.
 Most of them
>> >> >>>>>seem
>> >> >>>>> to be failing with some variant of this message:
>> >> >>>>>
>> >> >>>>> [ERROR] Failed to execute goal
>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> >>(default-clean)
>> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed
to delete
>> >> >>>>>
>> >> >>>>>
>> >>
>>
>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>> >>>>>>>fs
>> >> >>>>>-pr
>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> >> >>>>> -> [Help 1]
>> >> >>>>>
>> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
wrong
>> >> >>>>> permissions?
>> >> >>>>>
>> >> >>>>> Colin
>> >> >>>>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Lei (Eddy) Xu
>> >> >> Software Engineer, Cloudera
>> >>
>> >>
>> >
>> >
>> >--
>> >Sean
>>
>>
>
>
> --
> Sean
>



-- 
Sean

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message