hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)
Date Tue, 16 Jun 2015 09:30:50 GMT
How about "harbinger" for a name :)

On Sunday, June 7, 2015, Sean Busbey <busbey@cloudera.com> wrote:

> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>
>
>
> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <busbey@cloudera.com
> <javascript:;>> wrote:
>
> > Hi Folks!
> >
> > After working on test-patch with other folks for the last few months, I
> > think we've reached the point where we can make the fastest progress
> > towards the goal of a general use pre-commit patch tester by spinning
> > things into a project focused on just that. I think we have a mature
> enough
> > code base and a sufficient fledgling community, so I'm going to put
> > together a tlp proposal.
> >
> > Thanks for the feedback thus far from use within Hadoop. I hope we can
> > continue to make things more useful.
> >
> > -Sean
> >
> > On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <busbey@cloudera.com
> <javascript:;>> wrote:
> >
> >> HBase's dev-support folder is where the scripts and support files live.
> >> We've only recently started adding anything to the maven builds that's
> >> specific to jenkins[1]; so far it's diagnostic stuff, but that's where
> I'd
> >> add in more if we ran into the same permissions problems y'all are
> having.
> >>
> >> There's also our precommit job itself, though it isn't large[2]. AFAIK,
> >> we don't properly back this up anywhere, we just notify each other of
> >> changes on a particular mail thread[3].
> >>
> >> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> >> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
> >> read because I just finished fixing "mvn site" running out of permgen)
> >> [3]: http://s.apache.org/NT0
> >>
> >>
> >> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
> cnauroth@hortonworks.com <javascript:;>>
> >> wrote:
> >>
> >>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
> >>> HBase
> >>> repo?  Is there any additional context we need to be aware of?
> >>>
> >>> Chris Nauroth
> >>> Hortonworks
> >>> http://hortonworks.com/
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 3/11/15, 2:44 PM, "Sean Busbey" <busbey@cloudera.com <javascript:;>>
> wrote:
> >>>
> >>> >+dev@hbase
> >>> >
> >>> >HBase has recently been cleaning up our precommit jenkins jobs to make
> >>> >them
> >>> >more robust. From what I can tell our stuff started off as an earlier
> >>> >version of what Hadoop uses for testing.
> >>> >
> >>> >Folks on either side open to an experiment of combining our precommit
> >>> >check
> >>> >tooling? In principle we should be looking for the same kinds of
> things.
> >>> >
> >>> >Naturally we'll still need different jenkins jobs to handle different
> >>> >resource needs and we'd need to figure out where stuff eventually
> lives,
> >>> >but that could come later.
> >>> >
> >>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
> >>> cnauroth@hortonworks.com <javascript:;>>
> >>> >wrote:
> >>> >
> >>> >> The only thing I'm aware of is the failOnError option:
> >>> >>
> >>> >>
> >>> >>
> >>>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
> >>> >>rs
> >>> >> .html
> >>> >>
> >>> >>
> >>> >> I prefer that we don't disable this, because ignoring different
> kinds
> >>> of
> >>> >> failures could leave our build directories in an indeterminate
> state.
> >>> >>For
> >>> >> example, we could end up with an old class file on the classpath
for
> >>> >>test
> >>> >> runs that was supposedly deleted.
> >>> >>
> >>> >> I think it's worth exploring Eddy's suggestion to try simulating
> >>> failure
> >>> >> by placing a file where the code expects to see a directory.  That
> >>> might
> >>> >> even let us enable some of these tests that are skipped on Windows,
> >>> >> because Windows allows access for the owner even after permissions
> >>> have
> >>> >> been stripped.
> >>> >>
> >>> >> Chris Nauroth
> >>> >> Hortonworks
> >>> >> http://hortonworks.com/
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu
> <javascript:;>> wrote:
> >>> >>
> >>> >> >Is there a maven plugin or setting we can use to simply remove
> >>> >> >directories that have no executable permissions on them?  Clearly
> we
> >>> >> >have the permission to do this from a technical point of view
> (since
> >>> >> >we created the directories as the jenkins user), it's simply
that
> the
> >>> >> >code refuses to do it.
> >>> >> >
> >>> >> >Otherwise I guess we can just fix those tests...
> >>> >> >
> >>> >> >Colin
> >>> >> >
> >>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com
> <javascript:;>> wrote:
> >>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
> >>> >> >>
> >>> >> >> In HDFS-7722:
> >>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions
in
> >>> >> >>TearDown().
> >>> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally
clause.
> >>> >> >>
> >>> >> >> Also I ran mvn test several times on my machine and all
tests
> >>> passed.
> >>> >> >>
> >>> >> >> However, since in DiskChecker#checkDirAccess():
> >>> >> >>
> >>> >> >> private static void checkDirAccess(File dir) throws
> >>> >>DiskErrorException {
> >>> >> >>   if (!dir.isDirectory()) {
> >>> >> >>     throw new DiskErrorException("Not a directory: "
> >>> >> >>                                  + dir.toString());
> >>> >> >>   }
> >>> >> >>
> >>> >> >>   checkAccessByFileMethods(dir);
> >>> >> >> }
> >>> >> >>
> >>> >> >> One potentially safer alternative is replacing data dir
with a
> >>> >>regular
> >>> >> >> file to stimulate disk failures.
> >>> >> >>
> >>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >>> >> >><cnauroth@hortonworks.com <javascript:;>> wrote:
> >>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>> >> >>> TestDataNodeVolumeFailureReporting, and
> >>> >> >>> TestDataNodeVolumeFailureToleration all remove executable
> >>> >>permissions
> >>> >> >>>from
> >>> >> >>> directories like the one Colin mentioned to simulate
disk
> failures
> >>> >>at
> >>> >> >>>data
> >>> >> >>> nodes.  I reviewed the code for all of those, and
they all
> appear
> >>> >>to be
> >>> >> >>> doing the necessary work to restore executable permissions
at
> the
> >>> >>end
> >>> >> >>>of
> >>> >> >>> the test.  The only recent uncommitted patch I¹ve
seen that
> makes
> >>> >> >>>changes
> >>> >> >>> in these test suites is HDFS-7722.  That patch still
looks fine
> >>> >> >>>though.  I
> >>> >> >>> don¹t know if there are other uncommitted patches
that changed
> >>> these
> >>> >> >>>test
> >>> >> >>> suites.
> >>> >> >>>
> >>> >> >>> I suppose it¹s also possible that the JUnit process
unexpectedly
> >>> >>died
> >>> >> >>> after removing executable permissions but before restoring
them.
> >>> >>That
> >>> >> >>> always would have been a weakness of these test suites,
> regardless
> >>> >>of
> >>> >> >>>any
> >>> >> >>> recent changes.
> >>> >> >>>
> >>> >> >>> Chris Nauroth
> >>> >> >>> Hortonworks
> >>> >> >>> http://hortonworks.com/
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com
> <javascript:;>> wrote:
> >>> >> >>>
> >>> >> >>>>Hey Colin,
> >>> >> >>>>
> >>> >> >>>>I asked Andrew Bayer, who works with Apache Infra,
what's going
> on
> >>> >>with
> >>> >> >>>>these boxes. He took a look and concluded that
some perms are
> >>> being
> >>> >> >>>>set in
> >>> >> >>>>those directories by our unit tests which are precluding
those
> >>> files
> >>> >> >>>>from
> >>> >> >>>>getting deleted. He's going to clean up the boxes
for us, but we
> >>> >>should
> >>> >> >>>>expect this to keep happening until we can fix
the test in
> >>> question
> >>> >>to
> >>> >> >>>>properly clean up after itself.
> >>> >> >>>>
> >>> >> >>>>To help narrow down which commit it was that started
this,
> Andrew
> >>> >>sent
> >>> >> >>>>me
> >>> >> >>>>this info:
> >>> >> >>>>
> >>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>> >>
> >>>
> >>>
> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> >>> >>>>>>/
> >>> >> >>>>has
> >>> >> >>>>500 perms, so I'm guessing that's the problem.
Been that way
> since
> >>> >>9:32
> >>> >> >>>>UTC
> >>> >> >>>>on March 5th."
> >>> >> >>>>
> >>> >> >>>>--
> >>> >> >>>>Aaron T. Myers
> >>> >> >>>>Software Engineer, Cloudera
> >>> >> >>>>
> >>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
> >>> >><cmccabe@apache.org <javascript:;>>
> >>> >> >>>>wrote:
> >>> >> >>>>
> >>> >> >>>>> Hi all,
> >>> >> >>>>>
> >>> >> >>>>> A very quick (and not thorough) survey shows
that I can't find
> >>> any
> >>> >> >>>>> jenkins jobs that succeeded from the last
24 hours.  Most of
> >>> them
> >>> >> >>>>>seem
> >>> >> >>>>> to be failing with some variant of this message:
> >>> >> >>>>>
> >>> >> >>>>> [ERROR] Failed to execute goal
> >>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >>> >>(default-clean)
> >>> >> >>>>> on project hadoop-hdfs: Failed to clean project:
Failed to
> >>> delete
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >>
> >>>
> >>>
> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
> >>> >>>>>>>fs
> >>> >> >>>>>-pr
> >>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>> >> >>>>> -> [Help 1]
> >>> >> >>>>>
> >>> >> >>>>> Any ideas how this happened?  Bad disk, unit
test setting
> wrong
> >>> >> >>>>> permissions?
> >>> >> >>>>>
> >>> >> >>>>> Colin
> >>> >> >>>>>
> >>> >> >>>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> --
> >>> >> >> Lei (Eddy) Xu
> >>> >> >> Software Engineer, Cloudera
> >>> >>
> >>> >>
> >>> >
> >>> >
> >>> >--
> >>> >Sean
> >>>
> >>>
> >>
> >>
> >> --
> >> Sean
> >>
> >
> >
> >
> > --
> > Sean
> >
>
>
>
> --
> Sean
>


-- 
// Jonathan Hsieh (shay)
// HBase Tech Lead, Software Engineer, Cloudera
// jon@cloudera.com // @jmhsieh

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message