hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@cloudera.com>
Subject Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)
Date Wed, 24 Jun 2015 14:41:44 GMT
Hi Folks!

Work in a feature branch is now being tracked by HADOOP-12111.

On Thu, Jun 18, 2015 at 10:07 PM, Sean Busbey <busbey@cloudera.com> wrote:

> It looks like we have consensus.
>
> I'll start drafting up a proposal for the next board meeting (July 15th).
> Once we work out the name I'll submit a PODLINGNAMESEARCH jira to track
> that we did due diligence on whatever we pick.
>
> In the mean time, Hadoop PMC would y'all be willing to host us in a branch
> so that we can start prepping things now? We would want branch commit
> rights for the proposed new PMC.
>
>
> -Sean
>
>
> On Mon, Jun 15, 2015 at 6:47 PM, Sean Busbey <busbey@cloudera.com> wrote:
>
>> Oof. I had meant to push on this again but life got in the way and now
>> the June board meeting is upon us. Sorry everyone. In the event that this
>> ends up contentious, hopefully one of the copied communities can give us a
>> branch to work in.
>>
>> I know everyone is busy, so here's the short version of this email: I'd
>> like to move some of the code currently in Hadoop (test-patch) into a new
>> TLP focused on QA tooling. I'm not sure what the best format for priming
>> this conversation is. ORC filled in the incubator project proposal
>> template, but I'm not sure how much that confused the issue. So to start,
>> I'll just write what I'm hoping we can accomplish in general terms here.
>>
>> All software development projects that are community based (that is,
>> accepting outside contributions) face a common QA problem for vetting
>> in-coming contributions. Hadoop is fortunate enough to be sufficiently
>> popular that the weight of the problem drove tool development (i.e.
>> test-patch). That tool is generalizable enough that a bunch of other TLPs
>> have adopted their own forks. Unfortunately, in most projects this kind of
>> QA work is an enabler rather than a primary concern, so often the tooling
>> is worked on ad-hoc and little shared improvements happen across projects. Since
>> the tooling itself is never a primary concern, any made is rarely reused
>> outside of ASF projects.
>>
>> Over the last couple months a few of us have been working on generalizing
>> the tooling present in the Hadoop code base (because it was the most mature
>> out of all those in the various projects) and it's reached a point where we
>> think we can start bringing on other downstream users. This means we need
>> to start establishing things like a release cadence and to grow the new
>> contributors we have to handle more project responsibility. Personally, I
>> think that means it's time to move out from under Hadoop to drive things as
>> our own community. Eventually, I hope the community can help draw in a
>> group of folks traditionally underrepresented in ASF projects, namely QA
>> and operations folks.
>>
>> I think test-patch by itself has enough scope to justify a project.
>> Having a solid set of build tools that are customizable to fit the norms of
>> different software communities is a bunch of work. Making it work well in
>> both the context of automated test systems like Jenkins and for individual
>> developers is even more work. We could easily also take over maintenance of
>> things like shelldocs, since test-patch is the primary consumer of that
>> currently but it's generally useful tooling.
>>
>> In addition to test-patch, I think the proposed project has some future
>> growth potential. Given some adoption of test-patch to prove utility, the
>> project could build on the ties it makes to start building tools to help
>> projects do their own longer-run testing. Note that I'm talking about the
>> tools to build QA processes and not a particular set of tested components.
>> Specifically, I think the ChaosMonkey work that's in HBase should be
>> generalizable as a fault injection framework (either based on that code or
>> something like it). Doing this for arbitrary software is obviously very
>> difficult, and a part of easing that will be to make (and then favor)
>> tooling to allow projects to have operational glue that looks the same.
>> Namely, the shell work that's been done in hadoop-functions.sh would be a
>> great foundational layer that could bring good daemon handling practices to
>> a whole slew of software projects. In the event that these frameworks and
>> tools get adopted by parts of the Hadoop ecosystem, that could make the job
>> of i.e. Bigtop substantially easier.
>>
>> I've reached out to a few folks who have been involved in the current
>> test-patch work or expressed interest in helping out on getting it used in
>> other projects. Right now, the proposed PMC would be (alphabetical by last
>> name):
>>
>> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds
>> pmc, sqoop pmc, all around Jenkins expert)
>> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
>> * Nick Dimiduk (hbase pmc, phoenix pmc)
>> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
>> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
>> phoenix pmc)
>> * Allen Wittenauer (hadoop committer)
>>
>> That PMC gives us several members and a bunch of folks familiar with the
>> ASF. Combined with the code already existing in Apache spaces, I think that
>> gives us sufficient justification for a direct board proposal.
>>
>> The planned project name is "Apache Yetus". It's an archaic genus of sea
>> snail and most of our project will be focused on shell scripts.
>>
>> N.b.: this does not mean that the Hadoop community would _have_ to rely
>> on the new TLP, but I hope that once we have a release that can be
>> evaluated there'd be enough benefit to strongly encourage it.
>>
>> This has mostly been focused on scope and community issues, and I'd love
>> to talk through any feedback on that. Additionally, are there any other
>> points folks want to make sure are covered before we have a resolution?
>>
>> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey <busbey@cloudera.com> wrote:
>>
>>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>>
>>>
>>>
>>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey <busbey@cloudera.com>
>>> wrote:
>>>
>>>> Hi Folks!
>>>>
>>>> After working on test-patch with other folks for the last few months, I
>>>> think we've reached the point where we can make the fastest progress
>>>> towards the goal of a general use pre-commit patch tester by spinning
>>>> things into a project focused on just that. I think we have a mature enough
>>>> code base and a sufficient fledgling community, so I'm going to put
>>>> together a tlp proposal.
>>>>
>>>> Thanks for the feedback thus far from use within Hadoop. I hope we can
>>>> continue to make things more useful.
>>>>
>>>> -Sean
>>>>
>>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey <busbey@cloudera.com>
>>>> wrote:
>>>>
>>>>> HBase's dev-support folder is where the scripts and support files
>>>>> live. We've only recently started adding anything to the maven builds
>>>>> that's specific to jenkins[1]; so far it's diagnostic stuff, but that's
>>>>> where I'd add in more if we ran into the same permissions problems y'all
>>>>> are having.
>>>>>
>>>>> There's also our precommit job itself, though it isn't large[2].
>>>>> AFAIK, we don't properly back this up anywhere, we just notify each other
>>>>> of changes on a particular mail thread[3].
>>>>>
>>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
>>>>> all read because I just finished fixing "mvn site" running out of permgen)
>>>>> [3]: http://s.apache.org/NT0
>>>>>
>>>>>
>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth <
>>>>> cnauroth@hortonworks.com> wrote:
>>>>>
>>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder in
the
>>>>>> HBase
>>>>>> repo?  Is there any additional context we need to be aware of?
>>>>>>
>>>>>> Chris Nauroth
>>>>>> Hortonworks
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 3/11/15, 2:44 PM, "Sean Busbey" <busbey@cloudera.com> wrote:
>>>>>>
>>>>>> >+dev@hbase
>>>>>> >
>>>>>> >HBase has recently been cleaning up our precommit jenkins jobs
to
>>>>>> make
>>>>>> >them
>>>>>> >more robust. From what I can tell our stuff started off as an
earlier
>>>>>> >version of what Hadoop uses for testing.
>>>>>> >
>>>>>> >Folks on either side open to an experiment of combining our precommit
>>>>>> >check
>>>>>> >tooling? In principle we should be looking for the same kinds
of
>>>>>> things.
>>>>>> >
>>>>>> >Naturally we'll still need different jenkins jobs to handle different
>>>>>> >resource needs and we'd need to figure out where stuff eventually
>>>>>> lives,
>>>>>> >but that could come later.
>>>>>> >
>>>>>> >On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth <
>>>>>> cnauroth@hortonworks.com>
>>>>>> >wrote:
>>>>>> >
>>>>>> >> The only thing I'm aware of is the failOnError option:
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> http://maven.apache.org/plugins/maven-clean-plugin/examples/ignoring-erro
>>>>>> >>rs
>>>>>> >> .html
>>>>>> >>
>>>>>> >>
>>>>>> >> I prefer that we don't disable this, because ignoring different
>>>>>> kinds of
>>>>>> >> failures could leave our build directories in an indeterminate
>>>>>> state.
>>>>>> >>For
>>>>>> >> example, we could end up with an old class file on the classpath
>>>>>> for
>>>>>> >>test
>>>>>> >> runs that was supposedly deleted.
>>>>>> >>
>>>>>> >> I think it's worth exploring Eddy's suggestion to try simulating
>>>>>> failure
>>>>>> >> by placing a file where the code expects to see a directory.
 That
>>>>>> might
>>>>>> >> even let us enable some of these tests that are skipped
on Windows,
>>>>>> >> because Windows allows access for the owner even after permissions
>>>>>> have
>>>>>> >> been stripped.
>>>>>> >>
>>>>>> >> Chris Nauroth
>>>>>> >> Hortonworks
>>>>>> >> http://hortonworks.com/
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On 3/11/15, 2:10 PM, "Colin McCabe" <cmccabe@alumni.cmu.edu>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> >Is there a maven plugin or setting we can use to simply
remove
>>>>>> >> >directories that have no executable permissions on them?
 Clearly
>>>>>> we
>>>>>> >> >have the permission to do this from a technical point
of view
>>>>>> (since
>>>>>> >> >we created the directories as the jenkins user), it's
simply that
>>>>>> the
>>>>>> >> >code refuses to do it.
>>>>>> >> >
>>>>>> >> >Otherwise I guess we can just fix those tests...
>>>>>> >> >
>>>>>> >> >Colin
>>>>>> >> >
>>>>>> >> >On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <lei@cloudera.com>
wrote:
>>>>>> >> >> Thanks a lot for looking into HDFS-7722, Chris.
>>>>>> >> >>
>>>>>> >> >> In HDFS-7722:
>>>>>> >> >> TestDataNodeVolumeFailureXXX tests reset data dir
permissions in
>>>>>> >> >>TearDown().
>>>>>> >> >> TestDataNodeHotSwapVolumes reset permissions in
a finally
>>>>>> clause.
>>>>>> >> >>
>>>>>> >> >> Also I ran mvn test several times on my machine
and all tests
>>>>>> passed.
>>>>>> >> >>
>>>>>> >> >> However, since in DiskChecker#checkDirAccess():
>>>>>> >> >>
>>>>>> >> >> private static void checkDirAccess(File dir) throws
>>>>>> >>DiskErrorException {
>>>>>> >> >>   if (!dir.isDirectory()) {
>>>>>> >> >>     throw new DiskErrorException("Not a directory:
"
>>>>>> >> >>                                  + dir.toString());
>>>>>> >> >>   }
>>>>>> >> >>
>>>>>> >> >>   checkAccessByFileMethods(dir);
>>>>>> >> >> }
>>>>>> >> >>
>>>>>> >> >> One potentially safer alternative is replacing
data dir with a
>>>>>> >>regular
>>>>>> >> >> file to stimulate disk failures.
>>>>>> >> >>
>>>>>> >> >> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>>> >> >><cnauroth@hortonworks.com> wrote:
>>>>>> >> >>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>> >> >>> TestDataNodeVolumeFailureReporting, and
>>>>>> >> >>> TestDataNodeVolumeFailureToleration all remove
executable
>>>>>> >>permissions
>>>>>> >> >>>from
>>>>>> >> >>> directories like the one Colin mentioned to
simulate disk
>>>>>> failures
>>>>>> >>at
>>>>>> >> >>>data
>>>>>> >> >>> nodes.  I reviewed the code for all of those,
and they all
>>>>>> appear
>>>>>> >>to be
>>>>>> >> >>> doing the necessary work to restore executable
permissions at
>>>>>> the
>>>>>> >>end
>>>>>> >> >>>of
>>>>>> >> >>> the test.  The only recent uncommitted patch
I¹ve seen that
>>>>>> makes
>>>>>> >> >>>changes
>>>>>> >> >>> in these test suites is HDFS-7722.  That patch
still looks fine
>>>>>> >> >>>though.  I
>>>>>> >> >>> don¹t know if there are other uncommitted
patches that changed
>>>>>> these
>>>>>> >> >>>test
>>>>>> >> >>> suites.
>>>>>> >> >>>
>>>>>> >> >>> I suppose it¹s also possible that the JUnit
process
>>>>>> unexpectedly
>>>>>> >>died
>>>>>> >> >>> after removing executable permissions but before
restoring
>>>>>> them.
>>>>>> >>That
>>>>>> >> >>> always would have been a weakness of these
test suites,
>>>>>> regardless
>>>>>> >>of
>>>>>> >> >>>any
>>>>>> >> >>> recent changes.
>>>>>> >> >>>
>>>>>> >> >>> Chris Nauroth
>>>>>> >> >>> Hortonworks
>>>>>> >> >>> http://hortonworks.com/
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> On 3/10/15, 1:47 PM, "Aaron T. Myers" <atm@cloudera.com>
>>>>>> wrote:
>>>>>> >> >>>
>>>>>> >> >>>>Hey Colin,
>>>>>> >> >>>>
>>>>>> >> >>>>I asked Andrew Bayer, who works with Apache
Infra, what's
>>>>>> going on
>>>>>> >>with
>>>>>> >> >>>>these boxes. He took a look and concluded
that some perms are
>>>>>> being
>>>>>> >> >>>>set in
>>>>>> >> >>>>those directories by our unit tests which
are precluding those
>>>>>> files
>>>>>> >> >>>>from
>>>>>> >> >>>>getting deleted. He's going to clean up
the boxes for us, but
>>>>>> we
>>>>>> >>should
>>>>>> >> >>>>expect this to keep happening until we can
fix the test in
>>>>>> question
>>>>>> >>to
>>>>>> >> >>>>properly clean up after itself.
>>>>>> >> >>>>
>>>>>> >> >>>>To help narrow down which commit it was
that started this,
>>>>>> Andrew
>>>>>> >>sent
>>>>>> >> >>>>me
>>>>>> >> >>>>this info:
>>>>>> >> >>>>
>>>>>> >> >>>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>> >>
>>>>>>
>>>>>> >>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>> >>>>>>/
>>>>>> >> >>>>has
>>>>>> >> >>>>500 perms, so I'm guessing that's the problem.
Been that way
>>>>>> since
>>>>>> >>9:32
>>>>>> >> >>>>UTC
>>>>>> >> >>>>on March 5th."
>>>>>> >> >>>>
>>>>>> >> >>>>--
>>>>>> >> >>>>Aaron T. Myers
>>>>>> >> >>>>Software Engineer, Cloudera
>>>>>> >> >>>>
>>>>>> >> >>>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P.
McCabe
>>>>>> >><cmccabe@apache.org>
>>>>>> >> >>>>wrote:
>>>>>> >> >>>>
>>>>>> >> >>>>> Hi all,
>>>>>> >> >>>>>
>>>>>> >> >>>>> A very quick (and not thorough) survey
shows that I can't
>>>>>> find any
>>>>>> >> >>>>> jenkins jobs that succeeded from the
last 24 hours.  Most of
>>>>>> them
>>>>>> >> >>>>>seem
>>>>>> >> >>>>> to be failing with some variant of
this message:
>>>>>> >> >>>>>
>>>>>> >> >>>>> [ERROR] Failed to execute goal
>>>>>> >> >>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>> >>(default-clean)
>>>>>> >> >>>>> on project hadoop-hdfs: Failed to clean
project: Failed to
>>>>>> delete
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >>
>>>>>>
>>>>>> >>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>> >>>>>>>fs
>>>>>> >> >>>>>-pr
>>>>>> >> >>>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>> >> >>>>> -> [Help 1]
>>>>>> >> >>>>>
>>>>>> >> >>>>> Any ideas how this happened?  Bad disk,
unit test setting
>>>>>> wrong
>>>>>> >> >>>>> permissions?
>>>>>> >> >>>>>
>>>>>> >> >>>>> Colin
>>>>>> >> >>>>>
>>>>>> >> >>>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> --
>>>>>> >> >> Lei (Eddy) Xu
>>>>>> >> >> Software Engineer, Cloudera
>>>>>> >>
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> >--
>>>>>> >Sean
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sean
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>>
>> --
>> Sean
>>
>
>
>
> --
> Sean
>



-- 
Sean

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message