hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@cloudera.com>
Subject Re: upstream jenkins build broken?
Date Wed, 11 Mar 2015 22:16:37 GMT
HBase's dev-support folder is where the scripts and support files live.
We've only recently started adding anything to the maven builds that's
specific to jenkins[1]; so far it's diagnostic stuff, but that's where I'd
add in more if we ran into the same permissions problems y'all are having.

There's also our precommit job itself, though it isn't large[2]. AFAIK, we
don't properly back this up anywhere, we just notify each other of changes
on a particular mail thread[3].

[1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
[2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all read
because I just finished fixing "mvn site" running out of permgen)
[3]: http://s.apache.org/NT0


On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <cnauroth@hortonworks.com>
wrote:

> Sure, thanks Sean!  Do we just look in the dev-support folder in the HBase
> repo?  Is there any additional context we need to be aware of?
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 3/11/15, 2:44 PM, "Sean Busbey" <busbey@cloudera.com> wrote:
>
> >+dev@hbase
> >
> >HBase has recently been cleaning up our precommit jenkins jobs to make
> >them
> >more robust. From what I can tell our stuff started off as an earlier
> >version of what Hadoop uses for testing.
> >
> >Folks on either side open to an experiment of combining our precommit
> >check
> >tooling? In principle we should be looking for the same kinds of things.
> >
> >Naturally we'll still need different jenkins jobs to handle different
> >resource needs and we'd need to figure out where stuff eventually lives,
> >but that could come later.
> >
> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <cnauroth@hortonworks.com>
> >wrote:
> >
> >> The only thing I'm aware of is the failOnError option:
> >>
> >>
> >>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
> >>rs
> >> .html
> >>
> >>
> >> I prefer that we don't disable this, because ignoring different kinds of
> >> failures could leave our build directories in an indeterminate state.
> >>For
> >> example, we could end up with an old class file on the classpath for
> >>test
> >> runs that was supposedly deleted.
> >>
> >> I think it's worth exploring Eddy's suggestion to try simulating failure
> >> by placing a file where the code expects to see a directory.  That might
> >> even let us enable some of these tests that are skipped on Windows,
> >> because Windows allows access for the owner even after permissions have
> >> been stripped.
> >>
> >> Chris Nauroth
> >> Hortonworks
> >> http://hortonworks.com/
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu> wrote:
> >>
> >> >Is there a maven plugin or setting we can use to simply remove
> >> >directories that have no executable permissions on them?  Clearly we
> >> >have the permission to do this from a technical point of view (since
> >> >we created the directories as the jenkins user), it's simply that the
> >> >code refuses to do it.
> >> >
> >> >Otherwise I guess we can just fix those tests...
> >> >
> >> >Colin
> >> >
> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com> wrote:
> >> >> Thanks a lot for looking into HDFS-7722, Chris.
> >> >>
> >> >> In HDFS-7722:
> >> >> TestDataNodeVolumeFailureXXX tests reset data dir permissions in
> >> >>TearDown().
> >> >> TestDataNodeHotSwapVolumes reset permissions in a finally clause.
> >> >>
> >> >> Also I ran mvn test several times on my machine and all tests passed.
> >> >>
> >> >> However, since in DiskChecker#checkDirAccess():
> >> >>
> >> >> private static void checkDirAccess(File dir) throws
> >>DiskErrorException {
> >> >>   if (!dir.isDirectory()) {
> >> >>     throw new DiskErrorException("Not a directory: "
> >> >>                                  + dir.toString());
> >> >>   }
> >> >>
> >> >>   checkAccessByFileMethods(dir);
> >> >> }
> >> >>
> >> >> One potentially safer alternative is replacing data dir with a
> >>regular
> >> >> file to stimulate disk failures.
> >> >>
> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >> >><cnauroth@hortonworks.com> wrote:
> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >> >>> TestDataNodeVolumeFailureReporting, and
> >> >>> TestDataNodeVolumeFailureToleration all remove executable
> >>permissions
> >> >>>from
> >> >>> directories like the one Colin mentioned to simulate disk failures
> >>at
> >> >>>data
> >> >>> nodes.  I reviewed the code for all of those, and they all appear
> >>to be
> >> >>> doing the necessary work to restore executable permissions at the
> >>end
> >> >>>of
> >> >>> the test.  The only recent uncommitted patch I¹ve seen that makes
> >> >>>changes
> >> >>> in these test suites is HDFS-7722.  That patch still looks fine
> >> >>>though.  I
> >> >>> don¹t know if there are other uncommitted patches that changed
these
> >> >>>test
> >> >>> suites.
> >> >>>
> >> >>> I suppose it¹s also possible that the JUnit process unexpectedly
> >>died
> >> >>> after removing executable permissions but before restoring them.
> >>That
> >> >>> always would have been a weakness of these test suites, regardless
> >>of
> >> >>>any
> >> >>> recent changes.
> >> >>>
> >> >>> Chris Nauroth
> >> >>> Hortonworks
> >> >>> http://hortonworks.com/
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com>
wrote:
> >> >>>
> >> >>>>Hey Colin,
> >> >>>>
> >> >>>>I asked Andrew Bayer, who works with Apache Infra, what's going
on
> >>with
> >> >>>>these boxes. He took a look and concluded that some perms are
being
> >> >>>>set in
> >> >>>>those directories by our unit tests which are precluding those
files
> >> >>>>from
> >> >>>>getting deleted. He's going to clean up the boxes for us, but
we
> >>should
> >> >>>>expect this to keep happening until we can fix the test in question
> >>to
> >> >>>>properly clean up after itself.
> >> >>>>
> >> >>>>To help narrow down which commit it was that started this, Andrew
> >>sent
> >> >>>>me
> >> >>>>this info:
> >> >>>>
> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>
> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>>>>/
> >> >>>>has
> >> >>>>500 perms, so I'm guessing that's the problem. Been that way
since
> >>9:32
> >> >>>>UTC
> >> >>>>on March 5th."
> >> >>>>
> >> >>>>--
> >> >>>>Aaron T. Myers
> >> >>>>Software Engineer, Cloudera
> >> >>>>
> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
> >><cmccabe@apache.org>
> >> >>>>wrote:
> >> >>>>
> >> >>>>> Hi all,
> >> >>>>>
> >> >>>>> A very quick (and not thorough) survey shows that I can't
find any
> >> >>>>> jenkins jobs that succeeded from the last 24 hours.  Most
of them
> >> >>>>>seem
> >> >>>>> to be failing with some variant of this message:
> >> >>>>>
> >> >>>>> [ERROR] Failed to execute goal
> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >>(default-clean)
> >> >>>>> on project hadoop-hdfs: Failed to clean project: Failed
to delete
> >> >>>>>
> >> >>>>>
> >>
> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
> >>>>>>>fs
> >> >>>>>-pr
> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >> >>>>> -> [Help 1]
> >> >>>>>
> >> >>>>> Any ideas how this happened?  Bad disk, unit test setting
wrong
> >> >>>>> permissions?
> >> >>>>>
> >> >>>>> Colin
> >> >>>>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Lei (Eddy) Xu
> >> >> Software Engineer, Cloudera
> >>
> >>
> >
> >
> >--
> >Sean
>
>


-- 
Sean

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message