hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)
Date Tue, 16 Jun 2015 05:38:31 GMT

ZooKeeper is another project that has expressed interest in improving its
pre-commit process lately.  I understand Allen has had some success
applying this to the ZooKeeper build too, with some small caveats around
quirks in the build.xml that I think we can resolve.

I'm interested in defining how the release model works for a project like
this.  The current model of forking and checking it in directly to
multiple projects leads to the fragmentation and bugs described earlier in
the thread.  Another possible model is something more dynamic, like a
bootstrap script capable of checking out a release from a git tag before
launching pre-commit.  I'm interested to hear from various projects on how
they'd like to integrate.

--Chris Nauroth

On 6/15/15, 8:57 PM, "Josh Elser" <elserj@apache.org> wrote:

>(Have been talking to Sean in private on the subject -- seems
>appropriate to voice some public support)
>I'd be interested in this for Accumulo and Slider. For Accumulo, we've
>come a far way without a pre-commit build, primarily due to a CTR
>process. We have seen the repeated questions of "how do I run the tests"
>which a more automated workflow would help with, IMO. I think Slider
>could benefit with the same reasons.
>I'd also be giddy to see the recent improvements in Hadoop trickle down
>into the other projects that Allen already mentioned.
>Take this as record that I'd be happy to try to help out where possible.
>Sean Busbey wrote:
>> thank you for making a more digestible version Allen. :)
>> If you're interested in soliciting feedback from other projects, I
>> ASF short links to this thread in common-dev and hbase:
>> * http://s.apache.org/yetus-discuss-hadoop
>> * http://s.apache.org/yetus-discuss-hbase
>> While I agree that it's important to get feedback from ASF projects that
>> might find this useful, I can say that recently I've been involved in
>> non-ASF project YCSB and both the pretest and better shell stuff would
>> immensely useful over there.
>> On Mon, Jun 15, 2015 at 10:36 PM, Allen Wittenauer<aw@altiscale.com>
>>>          I'm clearly +1 on this idea.  As part of the rewrite in
>>>Hadoop of
>>> test-patch, it was amazing to see how far and wide this bit of code as
>>> spread.  So I see consolidating everyone's efforts as a huge win for a
>>> large number of projects.  (esp considering how many I saw suffering
>>>from a
>>> variety of identified bugs! )
>>>          But….
>>>          I think it's important for people involved in those other
>>> to speak up and voice an opinion as to whether this is useful.
>>> To summarize:
>>>          In the short term, a single location to get/use a precommit
>>> tester rather than everyone building/supporting their own in their
>>> time.
>>>           FWIW, we've already got the code base modified to be
>>> We've written some basic/simple plugins that support Hadoop, HBase,
>>> Tez, Pig, and Flink.  For HBase and Flink, this does include their
>>> checks.  Adding support for other project shouldn't be hard.  Simple
>>> projects take almost no time after seeing the basic pattern.
>>>          I think it's worthwhile highlighting that means support for
>>> JIRA and GitHub as well as Ant and Maven from the same code base.
>>> Longer term:
>>>          Well, we clearly have ideas of things that we want to do.
>>> more features to test-patch (review board? gradle?) is obvious. But
>>> about teasing apart and generalizing some of the other shell bits from
>>> projects? A common library for building CLI tools to fault injection to
>>> release documentation creation tools to …  I'd even like to see us get
>>> advanced as a "run this program to auto-generate daemon stop/start
>>>          I had a few chats with people about this idea at Hadoop
>>> What's truly exciting are the ideas that people had once they realized
>>> kinds of problems we're trying to solve.  It's always amazing the
>>> that projects have that could be solved by these types of solutions.
>>> stop hiding our cool toys in this area.
>>>          So, what feedback and ideas do you have in this area?  Are
>>>you a
>>> yay or a nay?
>>> On Jun 15, 2015, at 4:47 PM, Sean Busbey<busbey@cloudera.com>  wrote:
>>>> Oof. I had meant to push on this again but life got in the way and now
>>> the
>>>> June board meeting is upon us. Sorry everyone. In the event that this
>>> ends
>>>> up contentious, hopefully one of the copied communities can give us a
>>>> branch to work in.
>>>> I know everyone is busy, so here's the short version of this email:
>>>> like to move some of the code currently in Hadoop (test-patch) into a
>>>> TLP focused on QA tooling. I'm not sure what the best format for
>>>> this conversation is. ORC filled in the incubator project proposal
>>>> template, but I'm not sure how much that confused the issue. So to
>>>> I'll just write what I'm hoping we can accomplish in general terms
>>>> All software development projects that are community based (that is,
>>>> accepting outside contributions) face a common QA problem for vetting
>>>> in-coming contributions. Hadoop is fortunate enough to be sufficiently
>>>> popular that the weight of the problem drove tool development (i.e.
>>>> test-patch). That tool is generalizable enough that a bunch of other
>>>> have adopted their own forks. Unfortunately, in most projects this
>>> of
>>>> QA work is an enabler rather than a primary concern, so often the
>>>> is worked on ad-hoc and little shared improvements happen across
>>>> projects. Since
>>>> the tooling itself is never a primary concern, any made is rarely
>>>> outside of ASF projects.
>>>> Over the last couple months a few of us have been working on
>>>> the tooling present in the Hadoop code base (because it was the most
>>> mature
>>>> out of all those in the various projects) and it's reached a point
>>> we
>>>> think we can start bringing on other downstream users. This means we
>>>> to start establishing things like a release cadence and to grow the
>>>> contributors we have to handle more project responsibility.
>>>>Personally, I
>>>> think that means it's time to move out from under Hadoop to drive
>>> as
>>>> our own community. Eventually, I hope the community can help draw in a
>>>> group of folks traditionally underrepresented in ASF projects, namely
>>>> and operations folks.
>>>> I think test-patch by itself has enough scope to justify a project.
>>> Having
>>>> a solid set of build tools that are customizable to fit the norms of
>>>> different software communities is a bunch of work. Making it work
>>>>well in
>>>> both the context of automated test systems like Jenkins and for
>>> individual
>>>> developers is even more work. We could easily also take over
>>> of
>>>> things like shelldocs, since test-patch is the primary consumer of
>>>> currently but it's generally useful tooling.
>>>> In addition to test-patch, I think the proposed project has some
>>>> growth potential. Given some adoption of test-patch to prove utility,
>>>> project could build on the ties it makes to start building tools to
>>>> projects do their own longer-run testing. Note that I'm talking about
>>>> tools to build QA processes and not a particular set of tested
>>> components.
>>>> Specifically, I think the ChaosMonkey work that's in HBase should be
>>>> generalizable as a fault injection framework (either based on that
>>> or
>>>> something like it). Doing this for arbitrary software is obviously
>>>> difficult, and a part of easing that will be to make (and then favor)
>>>> tooling to allow projects to have operational glue that looks the
>>>> Namely, the shell work that's been done in hadoop-functions.sh would
>>>>be a
>>>> great foundational layer that could bring good daemon handling
>>> to
>>>> a whole slew of software projects. In the event that these frameworks
>>>> tools get adopted by parts of the Hadoop ecosystem, that could make
>>> job
>>>> of i.e. Bigtop substantially easier.
>>>> I've reached out to a few folks who have been involved in the current
>>>> test-patch work or expressed interest in helping out on getting it
>>> in
>>>> other projects. Right now, the proposed PMC would be (alphabetical by
>>> last
>>>> name):
>>>> * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc,
>>>> pmc, sqoop pmc, all around Jenkins expert)
>>>> * Sean Busbey (ASF member, accumulo pmc, hbase pmc)
>>>> * Nick Dimiduk (hbase pmc, phoenix pmc)
>>>> * Chris Nauroth (ASF member, incubator pmc, hadoop pmc)
>>>> * Andrew Purtell  (ASF member, incubator pmc, bigtop pmc, hbase pmc,
>>>> phoenix pmc)
>>>> * Allen Wittenauer (hadoop committer)
>>>> That PMC gives us several members and a bunch of folks familiar with
>>>> ASF. Combined with the code already existing in Apache spaces, I think
>>> that
>>>> gives us sufficient justification for a direct board proposal.
>>>> The planned project name is "Apache Yetus". It's an archaic genus of
>>>> snail and most of our project will be focused on shell scripts.
>>>> N.b.: this does not mean that the Hadoop community would _have_ to
>>> on
>>>> the new TLP, but I hope that once we have a release that can be
>>>> there'd be enough benefit to strongly encourage it.
>>>> This has mostly been focused on scope and community issues, and I'd
>>> to
>>>> talk through any feedback on that. Additionally, are there any other
>>> points
>>>> folks want to make sure are covered before we have a resolution?
>>>> On Sat, Jun 6, 2015 at 10:43 PM, Sean Busbey<busbey@cloudera.com>
>>> wrote:
>>>>> Sorry for the resend. I figured this deserves a [DISCUSS] flag.
>>>>> On Sat, Jun 6, 2015 at 10:39 PM, Sean Busbey<busbey@cloudera.com>
>>> wrote:
>>>>>> Hi Folks!
>>>>>> After working on test-patch with other folks for the last few
>>>>>>months, I
>>>>>> think we've reached the point where we can make the fastest progress
>>>>>> towards the goal of a general use pre-commit patch tester by
>>>>>> things into a project focused on just that. I think we have a mature
>>> enough
>>>>>> code base and a sufficient fledgling community, so I'm going to put
>>>>>> together a tlp proposal.
>>>>>> Thanks for the feedback thus far from use within Hadoop. I hope we
>>>>>> continue to make things more useful.
>>>>>> -Sean
>>>>>> On Wed, Mar 11, 2015 at 5:16 PM, Sean Busbey<busbey@cloudera.com>
>>> wrote:
>>>>>>> HBase's dev-support folder is where the scripts and support files
>>> live.
>>>>>>> We've only recently started adding anything to the maven builds
>>>>>>> specific to jenkins[1]; so far it's diagnostic stuff, but that's
>>> where I'd
>>>>>>> add in more if we ran into the same permissions problems y'all
>>> having.
>>>>>>> There's also our precommit job itself, though it isn't large[2].
>>> AFAIK,
>>>>>>> we don't properly back this up anywhere, we just notify each
>>>>>>> changes on a particular mail thread[3].
>>>>>>> [1]: https://github.com/apache/hbase/blob/master/pom.xml#L1687
>>>>>>> [2]: https://builds.apache.org/job/PreCommit-HBASE-Build/ (they're
>>> all
>>>>>>> read because I just finished fixing "mvn site" running out of
>>>>>>> [3]: http://s.apache.org/NT0
>>>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Chris Nauroth<
>>> cnauroth@hortonworks.com
>>>>>>>> wrote:
>>>>>>>> Sure, thanks Sean!  Do we just look in the dev-support folder
>>>>>>>> HBase
>>>>>>>> repo?  Is there any additional context we need to be aware
>>>>>>>> Chris Nauroth
>>>>>>>> Hortonworks
>>>>>>>> http://hortonworks.com/
>>>>>>>> On 3/11/15, 2:44 PM, "Sean Busbey"<busbey@cloudera.com>
>>>>>>>>> +dev@hbase
>>>>>>>>> HBase has recently been cleaning up our precommit jenkins
jobs to
>>> make
>>>>>>>>> them
>>>>>>>>> more robust. From what I can tell our stuff started off
as an
>>> earlier
>>>>>>>>> version of what Hadoop uses for testing.
>>>>>>>>> Folks on either side open to an experiment of combining
>>> precommit
>>>>>>>>> check
>>>>>>>>> tooling? In principle we should be looking for the same
kinds of
>>>>>>>> things.
>>>>>>>>> Naturally we'll still need different jenkins jobs to
>>> different
>>>>>>>>> resource needs and we'd need to figure out where stuff
>>>>>>>> lives,
>>>>>>>>> but that could come later.
>>>>>>>>> On Wed, Mar 11, 2015 at 4:34 PM, Chris Nauroth<
>>>>>>>> cnauroth@hortonworks.com>
>>>>>>>>> wrote:
>>>>>>>>>> The only thing I'm aware of is the failOnError option:
>>>>>>>>>> rs
>>>>>>>>>> .html
>>>>>>>>>> I prefer that we don't disable this, because ignoring
>>>>>>>> kinds of
>>>>>>>>>> failures could leave our build directories in an
>>> state.
>>>>>>>>>> For
>>>>>>>>>> example, we could end up with an old class file on
the classpath
>>> for
>>>>>>>>>> test
>>>>>>>>>> runs that was supposedly deleted.
>>>>>>>>>> I think it's worth exploring Eddy's suggestion to
try simulating
>>>>>>>> failure
>>>>>>>>>> by placing a file where the code expects to see a
>>>>>>>> might
>>>>>>>>>> even let us enable some of these tests that are skipped
>>>>>>>>>> because Windows allows access for the owner even
>>>>>>>> have
>>>>>>>>>> been stripped.
>>>>>>>>>> Chris Nauroth
>>>>>>>>>> Hortonworks
>>>>>>>>>> http://hortonworks.com/
>>>>>>>>>> On 3/11/15, 2:10 PM, "Colin McCabe"<cmccabe@alumni.cmu.edu>
>>> wrote:
>>>>>>>>>>> Is there a maven plugin or setting we can use
to simply remove
>>>>>>>>>>> directories that have no executable permissions
on them?
>>> we
>>>>>>>>>>> have the permission to do this from a technical
point of view
>>> (since
>>>>>>>>>>> we created the directories as the jenkins user),
it's simply
>>>>>>>> the
>>>>>>>>>>> code refuses to do it.
>>>>>>>>>>> Otherwise I guess we can just fix those tests...
>>>>>>>>>>> Colin
>>>>>>>>>>> On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu<lei@cloudera.com>
>>>>>>>>>>>> Thanks a lot for looking into HDFS-7722,
>>>>>>>>>>>> In HDFS-7722:
>>>>>>>>>>>> TestDataNodeVolumeFailureXXX tests reset
data dir permissions
>>>>>>>>>>>> TearDown().
>>>>>>>>>>>> TestDataNodeHotSwapVolumes reset permissions
in a finally
>>>>>>>>>>>> Also I ran mvn test several times on my machine
and all tests
>>>>>>>> passed.
>>>>>>>>>>>> However, since in DiskChecker#checkDirAccess():
>>>>>>>>>>>> private static void checkDirAccess(File dir)
>>>>>>>>>> DiskErrorException {
>>>>>>>>>>>>   if (!dir.isDirectory()) {
>>>>>>>>>>>>     throw new DiskErrorException("Not a directory:
>>>>>>>>>>>>                                  + dir.toString());
>>>>>>>>>>>>   }
>>>>>>>>>>>>   checkAccessByFileMethods(dir);
>>>>>>>>>>>> }
>>>>>>>>>>>> One potentially safer alternative is replacing
data dir with a
>>>>>>>>>> regular
>>>>>>>>>>>> file to stimulate disk failures.
>>>>>>>>>>>> On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth
>>>>>>>>>>>> <cnauroth@hortonworks.com>  wrote:
>>>>>>>>>>>>> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>>>>>>>>> TestDataNodeVolumeFailureReporting, and
>>>>>>>>>>>>> TestDataNodeVolumeFailureToleration all
remove executable
>>>>>>>>>> permissions
>>>>>>>>>>>>> from
>>>>>>>>>>>>> directories like the one Colin mentioned
to simulate disk
>>>>>>>> failures
>>>>>>>>>> at
>>>>>>>>>>>>> data
>>>>>>>>>>>>> nodes.  I reviewed the code for all of
those, and they all
>>> appear
>>>>>>>>>> to be
>>>>>>>>>>>>> doing the necessary work to restore executable
permissions at
>>> the
>>>>>>>>>> end
>>>>>>>>>>>>> of
>>>>>>>>>>>>> the test.  The only recent uncommitted
patch I¹ve seen that
>>> makes
>>>>>>>>>>>>> changes
>>>>>>>>>>>>> in these test suites is HDFS-7722.  That
patch still looks
>>>>>>>>>>>>> though.  I
>>>>>>>>>>>>> don¹t know if there are other uncommitted
patches that
>>>>>>>> these
>>>>>>>>>>>>> test
>>>>>>>>>>>>> suites.
>>>>>>>>>>>>> I suppose it¹s also possible that the
JUnit process
>>>>>>>>>> died
>>>>>>>>>>>>> after removing executable permissions
but before restoring
>>>>>>>>>> That
>>>>>>>>>>>>> always would have been a weakness of
these test suites,
>>>>>>>> regardless
>>>>>>>>>> of
>>>>>>>>>>>>> any
>>>>>>>>>>>>> recent changes.
>>>>>>>>>>>>> Chris Nauroth
>>>>>>>>>>>>> Hortonworks
>>>>>>>>>>>>> http://hortonworks.com/
>>>>>>>>>>>>> On 3/10/15, 1:47 PM, "Aaron T. Myers"<atm@cloudera.com>
>>>>>>>>>>>>>> Hey Colin,
>>>>>>>>>>>>>> I asked Andrew Bayer, who works with
Apache Infra, what's
>>>>>>>> on
>>>>>>>>>> with
>>>>>>>>>>>>>> these boxes. He took a look and concluded
that some perms
>>>>>>>> being
>>>>>>>>>>>>>> set in
>>>>>>>>>>>>>> those directories by our unit tests
which are precluding
>>>>>>>> files
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> getting deleted. He's going to clean
up the boxes for us,
>>> we
>>>>>>>>>> should
>>>>>>>>>>>>>> expect this to keep happening until
we can fix the test in
>>>>>>>> question
>>>>>>>>>> to
>>>>>>>>>>>>>> properly clean up after itself.
>>>>>>>>>>>>>> To help narrow down which commit
it was that started this,
>>> Andrew
>>>>>>>>>> sent
>>>>>>>>>>>>>> me
>>>>>>>>>>>>>> this info:
>>>>>>>>>>>>>> "/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>> Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>>>>>>>>> /
>>>>>>>>>>>>>> has
>>>>>>>>>>>>>> 500 perms, so I'm guessing that's
the problem. Been that way
>>>>>>>> since
>>>>>>>>>> 9:32
>>>>>>>>>>>>>> UTC
>>>>>>>>>>>>>> on March 5th."
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Aaron T. Myers
>>>>>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>>>>>> On Tue, Mar 10, 2015 at 1:24 PM,
Colin P. McCabe
>>>>>>>>>> <cmccabe@apache.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>> A very quick (and not thorough)
survey shows that I can't
>>>>>>>> any
>>>>>>>>>>>>>>> jenkins jobs that succeeded from
the last 24 hours.  Most
>>>>>>>> them
>>>>>>>>>>>>>>> seem
>>>>>>>>>>>>>>> to be failing with some variant
of this message:
>>>>>>>>>>>>>>> [ERROR] Failed to execute goal
>>>>>>>>>>>>>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>>>>>> (default-clean)
>>>>>>>>>>>>>>> on project hadoop-hdfs: Failed
to clean project: Failed to
>>>>>>>> delete
>>> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hd
>>>>>>>>>>>>>>> fs
>>>>>>>>>>>>>>> -pr
>>>>>>>>>>>>>>> oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>>>>>>>>>>> ->  [Help 1]
>>>>>>>>>>>>>>> Any ideas how this happened?
 Bad disk, unit test setting
>>> wrong
>>>>>>>>>>>>>>> permissions?
>>>>>>>>>>>>>>> Colin
>>>>>>>>>>>> --
>>>>>>>>>>>> Lei (Eddy) Xu
>>>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>> --
>>>>>>>>> Sean
>>>>>>> --
>>>>>>> Sean
>>>>>> --
>>>>>> Sean
>>>>> --
>>>>> Sean
>>>> --
>>>> Sean

View raw message