hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "GithubIntegration" by OwenOMalley
Date Thu, 05 Nov 2015 18:56:33 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "GithubIntegration" page has been changed by OwenOMalley:

New page:
= Github Setup and Pull Requests (PRs) =

There are several ways to setup Git for committers and contributors. Contributors can safely
setup Git any way they choose but committers should take extra care since they can push new
commits to the trunk at Apache and various policies there make backing out mistakes problematic.
To keep the commit history clean take note of the use of --squash below when merging into

== Git setup for Committers ==

This describes setup for one local repo and two remotes. It allows you to push the code on
your machine to either your Github repo or to git-wip-us.apache.org. You will want to fork
github's apache/hadoop to your own account on github, this will enable Pull Requests of your
own. Cloning this fork locally will set up "origin" to point to your remote fork on github
as the default remote. So if you perform "git push origin trunk" it will go to github.
To attach to the apache git repo do the following:

git remote add apache https://git-wip-us.apache.org/repos/asf/hadoop.git

To check your remote setup:

git remote -v

you should see something like this:

origin    https://github.com/your-github-id/hadoop.git (fetch)
origin    https://github.com/your-github-id/hadoop.git (push)
apache    https://git-wip-us.apache.org/repos/asf/hadoop.git (fetch)
apache    https://git-wip-us.apache.org/repos/asf/hadoop.git (push)

Now if you want to experiment with a branch everything, by default, points to your github
account because 'origin' is default. You can work as normal using only github until you are
ready to merge with the apache remote. Some conventions will integrate with Apache Jira ticket

git checkout -b hadoop-xxxx #xxxx typically is a Jira ticket number
#do some work on the branch
git commit -a -m "doing some work"
git push origin hadoop-xxxx # notice pushing to **origin** not **apache**

Once you are ready to commit to the apache remote you can merge and push them directly or
better yet create a PR.

== How to create a PR (committers) ==

Push your branch to Github:

git checkout hadoop-xxxx
git rebase apache/trunk # to make it apply to the current trunk
git push origin hadoop-xxxx

Go to your hadoop-xxxx branch on Github. Since you forked it from Github's apache/hadoop it
will default any PR to go to apache/trunk.
Click the green "Compare, review, and create pull request" button.
You can edit the to and from for the PR if it isn't correct. The "base fork" should be apache/hadoop
unless you are collaborating separately with one of the committers on the list. The "base"
will be trunk. Don't submit a PR to one of the other branches unless you know what you are
doing. The "head fork" will be your forked repo and the "compare" will be your hadoop-xxxx
Click the "Create pull request" button and name the request "HADOOP-XXXX" all caps. This will
connect the comments of the PR to the mailing list and Jira comments.
>From now on the PR lives on github's apache/hadoop. You use the commenting UI there.
If you are looking for a review or sharing with someone else say so in the comments but don't
worry about automated merging of your PR--you will have to do that later. The PR is tied to
your branch so you can respond to comments, make fixes, and commit them from your local repo.
They will appear on the PR page and be mirrored to Jira and the mailing list.
When you are satisfied and want to push it to Apache's remote repo proceed with Merging a

== How to create a PR (contributors) ==

Create pull requests: [[https://help.github.com/articles/creating-a-pull-request|GitHub PR
Pull requests are made to apache/hadoop repository on Github. In the Github UI you should
pick the trunk branch to target the PR as described for committers. This will be reviewed
and commented on so the merge is not automatic. This can be used for discussing a contributions
in progress.

== Merging a PR (yours or contributors) ==

Start with reading
[[https://help.github.com/articles/checking-out-pull-requests-locally/|GitHub PR merging locally]].
Remember that pull requests are equivalent to a remote github branch with potentially a multitude
of commits. In this case it is recommended to squash remote commit history to have one commit
per issue, rather than merging in a multitude of contributor's commits. In order to do that,
as well as close the PR at the same time, it is recommended to use squash commits.
Merging pull requests are equivalent to a "pull" of a contributor's branch:

git checkout trunk      # switch to local trunk branch
git pull apache trunk   # fast-forward to current remote HEAD
git pull --squash https://github.com/cuser/hadoop cbranch  # merge to trunk

--squash ensures all PR history is squashed into single commit, and allows committer to use
his/her own message. Read git help for merge or pull for more information about --squash option.
In this example we assume that the contributor's Github handle is "cuser" and the PR branch
name is "cbranch". Next, resolve conflicts, if any, or ask a contributor to rebase on top
of trunk, if PR went out of sync.
If you are ready to merge your own (committer's) PR you probably only need to merge (not pull),
since you have a local copy that you've been working on. This is the branch that you used
to create the PR.

git checkout trunk      # switch to local trunk branch
git pull apache trunk   # fast-forward to current remote HEAD
git merge --squash hadoop-xxxx

Remember to run regular patch checks, build with tests enabled, and change CHANGELOG.
If everything is fine, you now can commit the squashed request along the lines
git commit -a -m "HADOOP-XXXX description (cuser via your-apache-id) closes apache/hadoop#ZZ"
HADOOP-XXXX is all caps and where ZZ is the pull request number on apache/hadoop repository.
Including "closes apache/hadoop#ZZ" will close the PR automatically. More information is found
at [[https://help.github.com/articles/closing-issues-via-commit-messages|GitHub PR closing
Next, push to git-wip-us.a.o:
push apache trunk
(this will require Apache handle credentials).
The PR, once pushed, will get mirrored to github. To update your github version push there

push origin trunk

Note on squashing: Since squash discards remote branch history, repeated PRs from the same
remote branch are difficult for merging. The workflow implies that every new PR starts with
a new rebased branch. This is more important for contributors to know, rather than for committers,
because if new PR is not mergeable, github would warn to begin with. Anyway, watch for dupe
PRs (based on same source branches). This is a bad practice.

== Closing a PR without committing (for committers) ==

When we want to reject a PR (close without committing), we can just issue an empty commit
on trunk HEAD without merging the PR:

git commit --allow-empty -m "closes apache/hadoop#ZZ *Won't fix*"
git push apache trunk

that should close PR ZZ on github mirror without merging and any code modifications in the
master repository.

== Apache/github integration features ==

Read [[https://blogs.apache.org/infra/entry/improved_integration_between_apache_and|infra
blog]]. Comments and PRs with Hadoop issue handles should post to mailing lists and Jira.
Hadoop issue handles must in the form HADOOP-YYYYY (all capitals). Usually it makes sense
to file a jira issue first, and then create a PR with description
HADOOP-YYYY: <jira-issue-description>
In this case all subsequent comments will automatically be copied to jira without having to
mention jira issue explicitly in each comment of the PR.

View raw message