hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "GitAndHadoop" by AkiraAjisaka
Date Mon, 01 Sep 2014 08:43:49 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "GitAndHadoop" page has been changed by AkiraAjisaka:
https://wiki.apache.org/hadoop/GitAndHadoop?action=diff&rev1=17&rev2=18

Comment:
Use Git for the SCM system for Hadoop instead of SVN

  A lot of people use Git with Hadoop because they have their own patches to make to Hadoop,
and Git helps them manage it.
  
   * GitHub provide some good lessons on git at [[http://learn.github.com]]
-  * Apache serves up read-only Git versions of their source at [[http://git.apache.org/]].
People cannot commit changes with Git; for that the patches need to be applied to the SVN
repositories
+  * Apache serves up read-only Git versions of their source at [[http://git.apache.org/]].
Committers can commit changes to writable Git repository. See HowToCommitWithGit
  
  This page tells you how to work with Git. See HowToContribute for instructions on building
and testing Hadoop.
  <<TableOfContents(4)>>
+ 
  
  == Key Git Concepts ==
  The key concepts of Git.
@@ -23, +24 @@

  
  You need a copy of git on your system. Some IDEs ship with Git support; this page assumes
you are using the command line.
  
- Clone a local Git repository from the Apache repository. The Hadoop subprojects (common,
HDFS, and MapReduce) live inside a combined repository called `hadoop-common.git`.
+ Clone a local Git repository from the Apache repository. The Hadoop subprojects (common,
HDFS, and MapReduce) live inside a combined repository called `hadoop.git`.
  
  {{{
- git clone git://git.apache.org/hadoop-common.git
+ git clone git://git.apache.org/hadoop.git
  }}}
  
  The total download is well over 100MB, so the initial checkout process works best when the
network is fast. Once downloaded, Git works offline -though you will need to perform your
initial builds online so that the build tools (Maven, Ivy &c) can download dependencies.
  
  == Grafts for complete project history ==
  
- The Hadoop project has undergone some movement in where its component parts have been versioned.
Because of that, commands like `git log --follow` needs to have a little help. To graft the
history back together into a coherent whole, insert the following contents into `hadoop-common/.git/info/grafts`:
+ The Hadoop project has undergone some movement in where its component parts have been versioned.
Because of that, commands like `git log --follow` needs to have a little help. To graft the
history back together into a coherent whole, insert the following contents into `hadoop/.git/info/grafts`:
  
  {{{
  5128a9a453d64bfe1ed978cf9ffed27985eeef36 6c16dc8cf2b28818c852e95302920a278d07ad0c
@@ -49, +50 @@

  
   1. Create a GitHub login at http://github.com/ ; Add your public SSH keys
   1. Go to http://github.com/apache and search for the Hadoop and other Apache projects you
want (avro is handy alongside the others)
-  1. For each project, fork in the githb UI. This gives you your own repository URL which
you can then clone locally with {{{git clone}}}
+  1. For each project, fork in the github UI. This gives you your own repository URL which
you can then clone locally with {{{git clone}}}
   1. For each patch, branch.
  
- At the time of writing (December 2009), GitHub was updating its copy of the Apache repositories
every hour. As the Apache repositories were updating every 15 minutes, provided these frequencies
are retained, a GitHub-fork derived version will be at worst 1 hour and 15 minutes behind
the ASF's SVN repository. If you are actively developing on Hadoop, especially committing
code into the SVN repository, that is too long -work off the Apache repositories instead.

+ At the time of writing (December 2009), GitHub was updating its copy of the Apache repositories
every hour. As the Apache repositories were updating every 15 minutes, provided these frequencies
are retained, a GitHub-fork derived version will be at worst 1 hour and 15 minutes behind
the ASF's Git repository. If you are actively developing on Hadoop, especially committing
code into the Git repository, that is too long -work off the Apache repositories instead.

  
   1. Clone the read-only repository from Github (their recommendation) or from Apache (the
ASF's recommendation)
   1. in that clone, rename that repository "apache": {{{git remote rename origin apache}}}
   1. Log in to [http://github.com]
   1. Create a new repository (e.g hadoop-fork)
   1. In the existing clone, add the new repository : 
-  {{{git remote add -f github git@github.com:MYUSERNAMEHERE/hadoop-common.git}}}
+  {{{git remote add -f github git@github.com:MYUSERNAMEHERE/hadoop.git}}}
  
  This gives you a local repository with two remote repositories: "apache" and "github". Apache
has the trunk branch, which you can update whenever you want to get the latest ASF version:
  
@@ -71, +72 @@

  Your own branches can be merged with trunk, and pushed out to git hub. To generate patches
for submitting as JIRA patches, check everything in to your specific branch, merge that with
(a recently pulled) trunk, then diff the two:
  {{{ git diff --no-prefix trunk > ../hadoop-patches/HADOOP-XYX.patch }}}
  
- If you are working deep in the code it's not only convenient to have a directory full of
patches to the JIRA issues, it's convenient to have that directory a git repository that is
pushed to a remote server, such as [[https://github.com/steveloughran/hadoop-patches|this
example]]. Why? It helps you move patches from machine to machine without having to do all
the updating and merging. From a pure-git perspective this is wrong: it loses history, but
for a mixed git/svn workflow it doesn't matter so much.
+ If you are working deep in the code it's not only convenient to have a directory full of
patches to the JIRA issues, it's convenient to have that directory a git repository that is
pushed to a remote server, such as [[https://github.com/steveloughran/hadoop-patches|this
example]]. Why? It helps you move patches from machine to machine without having to do all
the updating and merging. From a pure-git perspective this is wrong: it loses history, but
for a mixed workflow it doesn't matter so much.
  
  
  == Branching ==
@@ -135, +136 @@

       long getAvailable() throws IOException {
  
  }}}
- It is essential that patches for JIRA issues are generated with the {{{--no-prefix}}} option.
Without that an extra directory path is listed, and the patches can only be applied with a
{{{patch -p1}}} call, ''which Hudson does not know to do''. If you want your patches to take,
this is what you have to do. You can of course test this yourself by using a command like
{{{patch -p0 << ../outgoing/HDFS-775.1}}} in a copy of the SVN source tree to test that
your patch takes.
+ It is essential that patches for JIRA issues are generated with the {{{--no-prefix}}} option.
Without that an extra directory path is listed, and the patches can only be applied with a
{{{patch -p1}}} call, ''which Hudson does not know to do''. If you want your patches to take,
this is what you have to do. You can of course test this yourself by using a command like
{{{patch -p0 << ../outgoing/HDFS-775.1}}} in a copy of the Git source tree to test that
your patch takes.
  
  === Updating your patch ===
  
  If your patch is not immediately accepted, do not be offended: it happens to us all. It
introduces a problem: your branches become out of date. You need to check out the latest apache
version, merge your branches with it, and then push the changes back to github
  
  {{{
-  git co trunk
+  git checkout trunk
   git pull apache
-  git co mybranch
+  git checkout mybranch
   git merge trunk
   git push github mybranch
  }}}
@@ -159, +160 @@

  
  === What to do when your patch is committed ===
  
- Once your patch is committed into SVN, you do not need the branch any more. You can delete
it straight away, but it is safer to verify the patch is completely merged in
+ Once your patch is committed into Git, you do not need the branch any more. You can delete
it straight away, but it is safer to verify the patch is completely merged in
  
  Pull down the latest release and verify that the patch branch is synchronized
  
  {{{
-  git co trunk
+  git checkout trunk
   git pull apache
-  git co mybranch
+  git checkout mybranch
   git merge trunk
   git diff trunk
  }}}
@@ -174, +175 @@

  the output of the last command should be nothing: the two branches should be identical.
You can then prove to git that this is true by switching back to the trunk branch and merging
in the branch, an operation which will not change the source tree, but update Git's branch
graph.
  
  {{{
-  git co trunk
+  git checkout trunk
   git merge mybranch
  }}}
  

Mime
View raw message