hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "GitAndHadoop" by SteveLoughran
Date Mon, 07 Dec 2009 16:13:26 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "GitAndHadoop" page has been changed by SteveLoughran.
The comment on this change is: more on branching.
http://wiki.apache.org/hadoop/GitAndHadoop?action=diff&rev1=4&rev2=5

--------------------------------------------------

  == Before you begin ==
  
   1. You need a copy of git on your system. Some IDEs ship with Git support; this page assumes
you are using the command line.
-  1. You need a copy of ant 1.7+ on your system for the builds themselves.
+  1. You need a copy of Ant 1.7+ on your system for the builds themselves.
-  1. You need to be online for your first checkout and build.
+  1. You need to be online for your first checkout and build, and any subsequent build which
needs to download new artifacts from the central JAR repositories.
   1. You need to set Ant up so that it works with any proxy you have. This is documented
by [[http://ant.apache.org/manual/proxy.html |the ant team]].
  
  
@@ -35, +35 @@

  }}}
  The total download is well over 100MB, so the initial checkout process works best when the
network is fast. Once downloaded, Git works offline.
  
+ == Forking onto GitHub ==
+ 
+ You can create your own fork of the ASF project, put in branches and stuff as you desire.
GitHub prefer you to explicitly fork their copies of Hadoop.
+ 
+  1. Create a githib login at http://github.com/ ; Add your public SSH keys
+  1. Go to http://github.com/apache and search for the Hadoop and other apache projects you
want (avro is handy alongside the others)
+  1. For each project, fork. This gives you your own repository URL which you can then clone
locally with {{{git clone}}}
+  1. For each patch, branch.
+ 
  == Building the source ==
  
  You need to tell all the Hadoop modules to get a local JAR of the bits of Hadoop they depend
on. You do this by making sure your Hadoop version does not match anything public, and to
use the "internal" repository of locally published artifacts.
@@ -55, +64 @@

  hadoop-mapred.version=${version}
  }}}
  
+ The {{{resolvers}}} property tells Ivy to look in the local maven artifact repository for
versions of the Hadoop artifacts; if you don't set this then only published JARs from the
central repostiory will get picked up.
+ 
+ The version property, and descendents, tells Hadoop which version of artifacts to create
and use. Set this to something different (ideally ahead of) what is being published, to ensure
that your own artifacts are picked up.
+ 
  Next, symlink this file to every Hadoop module. Now a change in the file gets picked up
by all three.
  {{{
- ln -s build.properties hadoop-common/build.properties
- ln -s build.properties hadoop-hdfs/build.properties
- ln -s build.properties hadoop-mapreduce/build.properties
+ pushd hadoop-common; ln -s build.properties ../build.properties; popd
+ pushd hadoop-hdfs; ln -s build.properties ../build.properties; popd
+ pushd hadoop-mapreduce; ln -s build.properties ../build.properties; popd
  }}}
  
- You are all set up to build.
+ You are now all set up to build.
  
  === Build Hadoop ===
  
@@ -72, +85 @@

  
  This Ant target not only builds the JAR files, it copies it to the local {{{${user.home}/.m2}}}
directory, where it will be picked up by the "internal" resolver. You can check that this
is taking place by running {{{ant ivy-report}}} on a project and seeing where it gets its
dependencies.
  
- If there are problems, don't be afraid to {{{rm -rf ~/.m2/repository/org/apache/hadoop}}}
 and {{{rm -rf ~/.ivy2/cache/org.apache.hadoop}}} to remove local copies of artifacts.
- 
  === Testing ===
  
  Each project comes with lots of tests; run {{{ant test}}} to run them. If you have made
changes to the build and tests fail, it may be that the tests never worked on your machine.
Build and test the unmodified source first. Then keep an eye on both the main source and any
branch you make. A good way to do this is to give a Continuous Integration server such as
Hudson this job: checking out, building and testing both branches.
  
+ == Branching ==
+ 
+ Hadoop makes it easy to branch. The recommended process for working with apache projects
is: one branch per JIRA issue. That makes it easy to isolate development and track the development
of each change. It does mean if you have your own branch that you release, one that merges
in more than one issue, you have to invest some effort in merging everything in. Try not to
make changes in different branches that are hard to merge.
+ 
+ One thing you need to look out for is making sure that you are building the different Hadoop
projects together; that you have not published on one branch and built on another. This is
because both Ivy and Maven publish artifacts to shared repository cache directories.
+ 
+  1. Don't be afraid to {{{rm -rf ~/.m2/repository/org/apache/hadoop}}}  and {{{rm -rf ~/.ivy2/cache/org.apache.hadoop}}}
to remove local copies of artifacts.
+  1. Use different version properties in different branches to ensure that different versions
are not accidentally picked up
+  1. Avoid using {{{latest.version}}} as the version marker in Ivy, as that gives you the
last built.
+  1. Don't build/test different branches simultaneously, such as by running Hudson on your
local machine while developing on the console. The trick here is bring up Hudson in a virtual
machine, running against the Git repository on your desktop. Git lets you do this, which lets
you run Hudson against your private branch.
+ 

Mime
View raw message