hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@apache.org>
Subject Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)
Date Tue, 16 Jun 2015 15:13:55 GMT
I think this is a great idea! Having just gone through the process of
getting Phoenix up to speed with precommits, it would be really nice to
have a place to go other than "fork/hack someone else's work". For the same
project, I recently integrated its first daemon service. This meant adding
a bunch of servicy Python code (multi platform support is required) which I
only sort of trust. Again, would be great to have an explicit resource for
this kind of thing in the ecosystem. I expect Calcite and Kylin will be
following along shortly.

Since we're tossing out names, how about Apache Bootstrap? It's a
meta-project to help other projects get off the ground, after all.

-n

On Monday, June 15, 2015, Sean Busbey <busbey@cloudera.com> wrote:

> Oof. I had meant to push on this again but life got in the way and now the
> June board meeting is upon us. Sorry everyone. In the event that this ends
> up contentious, hopefully one of the copied communities can give us a
> branch to work in.
>
> I know everyone is busy, so here's the short version of this email: I'd
> like to move some of the code currently in Hadoop (test-patch) into a new
> TLP focused on QA tooling. I'm not sure what the best format for priming
> this conversation is. ORC filled in the incubator project proposal
> template, but I'm not sure how much that confused the issue. So to start,
> I'll just write what I'm hoping we can accomplish in general terms here.
>
> All software development projects that are community based (that is,
> accepting outside contributions) face a common QA problem for vetting
> in-coming contributions. Hadoop is fortunate enough to be sufficiently
> popular that the weight of the problem drove tool development (i.e.
> test-patch). That tool is generalizable enough that a bunch of other TLPs
> have adopted their own forks. Unfortunately, in most projects this kind of
> QA work is an enabler rather than a primary concern, so often the tooling
> is worked on ad-hoc and little shared improvements happen across
> projects. Since
> the tooling itself is never a primary concern, any made is rarely reused
> outside of ASF projects.
>
> Over the last couple months a few of us have been working on generalizing
> the tooling present in the Hadoop code base (because it was the most mature
> out of all those in the various projects) and it's reached a point where we
> think we can start bringing on other downstream users. This means we need
> to start establishing things like a release cadence and to grow the new
> contributors we have to handle more project responsibility. Personally, I
> think that means it's time to move out from under Hadoop to drive things as
> our own community. Eventually, I hope the community can help draw in a
> group of folks traditionally underrepresented in ASF projects, namely QA
> and operations folks.
>
> I think test-patch by itself has enough scope to justify a project. Having
> a solid set of build tools that are customizable to fit the norms of
> different software communities is a bunch of work. Making it work well in
> both the context of automated test systems like Jenkins and for individual
> developers is even more work. We could easily also take over maintenance of
> things like shelldocs, since test-patch is the primary consumer of that
> currently but it's generally useful tooling.
>
> In addition to test-patch, I think the proposed project has some future
> growth potential. Given some adoption of test-patch to prove utility, the
> project could build on the ties it makes to start building tools to help
> projects do their own longer-run testing. Note that I'm talking about the
> tools to build QA processes and not a particular set of tested components.
> Specifically, I think the ChaosMonkey work that's in HBase should be
> generalizable as a fault injection framework (either based on that code or
> something like it). Doing this for arbitrary software is obviously very
> difficult, and a part of easing that will be to make (and then favor)
> tooling to allow projects to have operational glue that looks the same.
> Namely, the shell work that's been done in hadoop-functions.sh would be a
> great foundational layer that could bring good daemon handling practices to
> a whole slew of software projects. In the event that these frameworks and
> tools get adopted by parts of the Hadoop ecosystem, that could make the job
> of i.e. Bigtop substantially easier.
>
> I've reached out to a few folks who have been involved in the current
> test-patch work or expressed interest in helping out on getting it used in
> other projects. Right now, the proposed PMC would be (alphabetical by last
> name):
>
> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
> pmc, sqoop pmc, all around Jenkins expert)
> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
> * Nick Dimiduk (hbase pmc, phoenix pmc)
> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
> phoenix pmc)
> * Allen Wittenauer (hadoop committer)
>
> That PMC gives us several members and a bunch of folks familiar with the
> ASF. Combined with the code already existing in Apache spaces, I think that
> gives us sufficient justification for a direct board proposal.
>
> The planned project name is "Apache Yetus". It's an archaic genus of sea
> snail and most of our project will be focused on shell scripts.
>
> N.b.: this does not mean that the Hadoop community would _have_ to rely on
> the new TLP, but I hope that once we have a release that can be evaluated
> there'd be enough benefit to strongly encourage it.
>
> This has mostly been focused on scope and community issues, and I'd love to
> talk through any feedback on that. Additionally, are there any other points
> folks want to make sure are covered before we have a resolution?
>
> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <busbey@cloudera.com
> <javascript:;>> wrote:
>
> > Sorry for the resend. I figured this deserves a [DISCUSS] flag.
> >
> >
> >
> > On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <busbey@cloudera.com
> <javascript:;>> wrote:
> >
> >> Hi Folks!
> >>
> >> After working on test-patch with other folks for the last few months, I
> >> think we've reached the point where we can make the fastest progress
> >> towards the goal of a general use pre-commit patch tester by spinning
> >> things into a project focused on just that. I think we have a mature
> enough
> >> code base and a sufficient fledgling community, so I'm going to put
> >> together a tlp proposal.
> >>
> >> Thanks for the feedback thus far from use within Hadoop. I hope we can
> >> continue to make things more useful.
> >>
> >> -Sean
> >>
> >> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <busbey@cloudera.com
> <javascript:;>> wrote:
> >>
> >>> HBase's dev-support folder is where the scripts and support files live.
> >>> We've only recently started adding anything to the maven builds that's
> >>> specific to jenkins[1]; so far it's diagnostic stuff, but that's where
> I'd
> >>> add in more if we ran into the same permissions problems y'all are
> having.
> >>>
> >>> There's also our precommit job itself, though it isn't large[2]. AFAIK,
> >>> we don't properly back this up anywhere, we just notify each other of
> >>> changes on a particular mail thread[3].
> >>>
> >>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
> >>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're all
> >>> read because I just finished fixing "mvn site" running out of permgen)
> >>> [3]: http://s.apache.org/NT0
> >>>
> >>>
> >>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
> cnauroth@hortonworks.com <javascript:;>
> >>> > wrote:
> >>>
> >>>> Sure, thanks Sean!  Do we just look in the dev-support folder in the
> >>>> HBase
> >>>> repo?  Is there any additional context we need to be aware of?
> >>>>
> >>>> Chris Nauroth
> >>>> Hortonworks
> >>>> http://hortonworks.com/
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 3/11/15, 2:44 PM, "Sean Busbey" <busbey@cloudera.com
> <javascript:;>> wrote:
> >>>>
> >>>> >+dev@hbase
> >>>> >
> >>>> >HBase has recently been cleaning up our precommit jenkins jobs to
> make
> >>>> >them
> >>>> >more robust. From what I can tell our stuff started off as an earlier
> >>>> >version of what Hadoop uses for testing.
> >>>> >
> >>>> >Folks on either side open to an experiment of combining our precommit
> >>>> >check
> >>>> >tooling? In principle we should be looking for the same kinds of
> >>>> things.
> >>>> >
> >>>> >Naturally we'll still need different jenkins jobs to handle different
> >>>> >resource needs and we'd need to figure out where stuff eventually
> >>>> lives,
> >>>> >but that could come later.
> >>>> >
> >>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
> >>>> cnauroth@hortonworks.com <javascript:;>>
> >>>> >wrote:
> >>>> >
> >>>> >> The only thing I'm aware of is the failOnError option:
> >>>> >>
> >>>> >>
> >>>> >>
> >>>>
> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
> >>>> >>rs
> >>>> >> .html
> >>>> >>
> >>>> >>
> >>>> >> I prefer that we don't disable this, because ignoring different
> >>>> kinds of
> >>>> >> failures could leave our build directories in an indeterminate
> state.
> >>>> >>For
> >>>> >> example, we could end up with an old class file on the classpath
> for
> >>>> >>test
> >>>> >> runs that was supposedly deleted.
> >>>> >>
> >>>> >> I think it's worth exploring Eddy's suggestion to try simulating
> >>>> failure
> >>>> >> by placing a file where the code expects to see a directory.
 That
> >>>> might
> >>>> >> even let us enable some of these tests that are skipped on
Windows,
> >>>> >> because Windows allows access for the owner even after permissions
> >>>> have
> >>>> >> been stripped.
> >>>> >>
> >>>> >> Chris Nauroth
> >>>> >> Hortonworks
> >>>> >> http://hortonworks.com/
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu
> <javascript:;>> wrote:
> >>>> >>
> >>>> >> >Is there a maven plugin or setting we can use to simply
remove
> >>>> >> >directories that have no executable permissions on them?
 Clearly
> we
> >>>> >> >have the permission to do this from a technical point of
view
> (since
> >>>> >> >we created the directories as the jenkins user), it's simply
that
> >>>> the
> >>>> >> >code refuses to do it.
> >>>> >> >
> >>>> >> >Otherwise I guess we can just fix those tests...
> >>>> >> >
> >>>> >> >Colin
> >>>> >> >
> >>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com
> <javascript:;>> wrote:
> >>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
> >>>> >> >>
> >>>> >> >> In HDFS-7722:
> >>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir
permissions in
> >>>> >> >>TearDown().
> >>>> >> >> TestDataNodeHotSwapVolumes reset permissions in a
finally
> clause.
> >>>> >> >>
> >>>> >> >> Also I ran mvn test several times on my machine and
all tests
> >>>> passed.
> >>>> >> >>
> >>>> >> >> However, since in DiskChecker#checkDirAccess():
> >>>> >> >>
> >>>> >> >> private static void checkDirAccess(File dir) throws
> >>>> >>DiskErrorException {
> >>>> >> >>   if (!dir.isDirectory()) {
> >>>> >> >>     throw new DiskErrorException("Not a directory:
"
> >>>> >> >>                                  + dir.toString());
> >>>> >> >>   }
> >>>> >> >>
> >>>> >> >>   checkAccessByFileMethods(dir);
> >>>> >> >> }
> >>>> >> >>
> >>>> >> >> One potentially safer alternative is replacing data
dir with a
> >>>> >>regular
> >>>> >> >> file to stimulate disk failures.
> >>>> >> >>
> >>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
> >>>> >> >><cnauroth@hortonworks.com <javascript:;>>
wrote:
> >>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> >>>> >> >>> TestDataNodeVolumeFailureReporting, and
> >>>> >> >>> TestDataNodeVolumeFailureToleration all remove
executable
> >>>> >>permissions
> >>>> >> >>>from
> >>>> >> >>> directories like the one Colin mentioned to simulate
disk
> >>>> failures
> >>>> >>at
> >>>> >> >>>data
> >>>> >> >>> nodes.  I reviewed the code for all of those,
and they all
> appear
> >>>> >>to be
> >>>> >> >>> doing the necessary work to restore executable
permissions at
> the
> >>>> >>end
> >>>> >> >>>of
> >>>> >> >>> the test.  The only recent uncommitted patch I¹ve
seen that
> makes
> >>>> >> >>>changes
> >>>> >> >>> in these test suites is HDFS-7722.  That patch
still looks fine
> >>>> >> >>>though.  I
> >>>> >> >>> don¹t know if there are other uncommitted patches
that changed
> >>>> these
> >>>> >> >>>test
> >>>> >> >>> suites.
> >>>> >> >>>
> >>>> >> >>> I suppose it¹s also possible that the JUnit process
> unexpectedly
> >>>> >>died
> >>>> >> >>> after removing executable permissions but before
restoring
> them.
> >>>> >>That
> >>>> >> >>> always would have been a weakness of these test
suites,
> >>>> regardless
> >>>> >>of
> >>>> >> >>>any
> >>>> >> >>> recent changes.
> >>>> >> >>>
> >>>> >> >>> Chris Nauroth
> >>>> >> >>> Hortonworks
> >>>> >> >>> http://hortonworks.com/
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>>
> >>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com
> <javascript:;>> wrote:
> >>>> >> >>>
> >>>> >> >>>>Hey Colin,
> >>>> >> >>>>
> >>>> >> >>>>I asked Andrew Bayer, who works with Apache
Infra, what's going
> >>>> on
> >>>> >>with
> >>>> >> >>>>these boxes. He took a look and concluded that
some perms are
> >>>> being
> >>>> >> >>>>set in
> >>>> >> >>>>those directories by our unit tests which are
precluding those
> >>>> files
> >>>> >> >>>>from
> >>>> >> >>>>getting deleted. He's going to clean up the
boxes for us, but
> we
> >>>> >>should
> >>>> >> >>>>expect this to keep happening until we can
fix the test in
> >>>> question
> >>>> >>to
> >>>> >> >>>>properly clean up after itself.
> >>>> >> >>>>
> >>>> >> >>>>To help narrow down which commit it was that
started this,
> Andrew
> >>>> >>sent
> >>>> >> >>>>me
> >>>> >> >>>>this info:
> >>>> >> >>>>
> >>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
> >>>> >>
> >>>>
> >>>>
> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>> >>>>>>/
> >>>> >> >>>>has
> >>>> >> >>>>500 perms, so I'm guessing that's the problem.
Been that way
> >>>> since
> >>>> >>9:32
> >>>> >> >>>>UTC
> >>>> >> >>>>on March 5th."
> >>>> >> >>>>
> >>>> >> >>>>--
> >>>> >> >>>>Aaron T. Myers
> >>>> >> >>>>Software Engineer, Cloudera
> >>>> >> >>>>
> >>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
> >>>> >><cmccabe@apache.org <javascript:;>>
> >>>> >> >>>>wrote:
> >>>> >> >>>>
> >>>> >> >>>>> Hi all,
> >>>> >> >>>>>
> >>>> >> >>>>> A very quick (and not thorough) survey
shows that I can't
> find
> >>>> any
> >>>> >> >>>>> jenkins jobs that succeeded from the last
24 hours.  Most of
> >>>> them
> >>>> >> >>>>>seem
> >>>> >> >>>>> to be failing with some variant of this
message:
> >>>> >> >>>>>
> >>>> >> >>>>> [ERROR] Failed to execute goal
> >>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
> >>>> >>(default-clean)
> >>>> >> >>>>> on project hadoop-hdfs: Failed to clean
project: Failed to
> >>>> delete
> >>>> >> >>>>>
> >>>> >> >>>>>
> >>>> >>
> >>>>
> >>>>
> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
> >>>> >>>>>>>fs
> >>>> >> >>>>>-pr
> >>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
> >>>> >> >>>>> -> [Help 1]
> >>>> >> >>>>>
> >>>> >> >>>>> Any ideas how this happened?  Bad disk,
unit test setting
> wrong
> >>>> >> >>>>> permissions?
> >>>> >> >>>>>
> >>>> >> >>>>> Colin
> >>>> >> >>>>>
> >>>> >> >>>
> >>>> >> >>
> >>>> >> >>
> >>>> >> >>
> >>>> >> >> --
> >>>> >> >> Lei (Eddy) Xu
> >>>> >> >> Software Engineer, Cloudera
> >>>> >>
> >>>> >>
> >>>> >
> >>>> >
> >>>> >--
> >>>> >Sean
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Sean
> >>>
> >>
> >>
> >>
> >> --
> >> Sean
> >>
> >
> >
> >
> > --
> > Sean
> >
>
>
>
> --
> Sean
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message