hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "HowToContribute" by ArpitAgarwal
Date Thu, 04 Sep 2014 21:57:32 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "HowToContribute" page has been changed by ArpitAgarwal:
https://wiki.apache.org/hadoop/HowToContribute?action=diff&rev1=96&rev2=97

Comment:
Remove some duplicated content, point to BUILDING.txt, moved protobuf installation to separate
page.

  
  <<TableOfContents(4)>>
  
- === Setting up ===
+ == Dev Environment Setup ==
+ Here are some things you will need to build and test Hadoop. Be prepared to invest some
time to set up a working Hadoop dev environment. Try getting the project to build and test
locally first before  you start writing code.
  
- Here are some things you will need to build and test Hadoop. It does take some time to set
up a working Hadoop development environment, so be prepared to invest some time. Before you
actually begin trying to code in it, try getting the project to build and test locally first.
This is how you can test your installation.
+ === Get the source code ===
+ First of all, you need the Hadoop source code. The official location for Hadoop is the Apache
Git repository. See GitAndHadoop
  
- ==== Software Configuration Management (SCM) ====
+ === Read BUILDING.txt ===
+ Once you have the source code, we strongly recommend reading BUILDING.txt located in the
root of the source tree. It has up to date information on how to build Hadoop on various platforms
along with some workarounds for platform-specific quirks. The latest [[https://git-wip-us.apache.org/repos/asf?p=hadoop.git;a=blob;f=BUILDING.txt|BUILDING.txt]]
for the current trunk can also be viewed on the web.
  
- The SCM system for Hadoop is moved to Git. See GitAndHadoop for more details.
  
- ==== Integrated Development Environment (IDE) ====
+ === Integrated Development Environment (IDE) ===
- 
- You are free to use whatever IDE you prefer, or your favorite command line editor. Note
that
+ You are free to use whatever IDE you prefer or your favorite text editor. Note that:
-  * Building and testing is often done on the command line, or at least via the Maven support
in the IDEs.
+  * Building and testing is often done on the command line or at least via the Maven support
in the IDEs.
   * Set up the IDE to follow the source layout rules of the project.
-  * If you have commit rights to the repository, disable any added value "reformat" and "strip
trailing spaces" features on commits, as it can create extra noise.
+  * Disable any added value "reformat" and "strip trailing spaces" features as it can create
extra noise when reviewing patches.
  
- ==== Build Tools ====
+ === Build Tools ===
+  * A Java Development Kit. The Hadoop developers recommend [[http://java.com/|Oracle Java
7]]. You may also use [[http://openjdk.java.net/|OpenJDK]].
+  * Google Protocol Buffers. Check out the ProtocolBuffers guide for help installing protobuf.
+  * [[http://maven.apache.org/|Apache Maven]] version 3 or later (for Hadoop 0.23+)
- 
- To build the code, install (as well as the programs needed to build Hadoop on Windows, if
that is your development platform)
-  * [[http://maven.apache.org/|Apache Maven]]
-  * [[http://java.com/|Oracle Java 6 or 7]], or [[http://openjdk.java.net/|OpenJDK]]
- These should also be on your PATH; test by executing {{{mvn}}} and {{{javac}}} respectively.
- 
- As the Hadoop builds use the external Maven repository to download artifacts, Maven needs
to be set up with the proxy settings needed to make external HTTP requests. You will also
need to be online for the first builds of every Hadoop project, so that the dependencies can
all be downloaded.
- 
- === Other items ===
- 
-  * A Java Development Kit is required to be installed and on the path of executables. The
Hadoop developers recommend the Oracle JDK.
-  * The source code of projects that you depend on. Avro, Jetty, Log4J are some examples.
This isn't compulsory, but as the source is there, it helps you see what is going on.
-  * The source code of the Java version that you are using. Again: handy.
   * The Java API javadocs.
-  * the {{{diff}}} and {{{patch}}} commands, which ship with Unix/Linux systems, and come
with cygwin.
+ Ensure these are installed by executing {{{mvn}}}, {{{git}}} and {{{javac}}} respectively.
+ 
+ As the Hadoop builds use the external Maven repository to download artifacts, Maven needs
to be set up with the proxy settings needed to make external HTTP requests. The first build
of every Hadoop project needs internet connectivity to download Maven dependencies.
+  1. Be online for that first build, on a good network
+  1. To set the Maven proxy setttings, see http://maven.apache.org/guides/mini/guide-proxies.html
+  1. Because Maven doesn't pass proxy settings down to the Ant tasks it runs [[https://issues.apache.org/jira/browse/HDFS-2381|HDFS-2381]]
some parts of the Hadoop build may fail. The fix for this is to pass down the Ant proxy settings
in the build Unix: {{{mvn $ANT_OPTS}}}; Windows: {{{mvn %ANT_OPTS%}}}.
+  1. Tomcat is always downloaded, even when building offline.  Setting {{{-Dtomcat.download.url}}}
to a local copy and {{{-Dtomcat.version}}} to the version pointed to by the URL will avoid
that download.
+ 
  
  === Native libraries ===
- 
- On Unix, you need the tools to create the native libraries: LZO headers,zlib headers, gcc,
OpenSSL headers, cmake, protobuf dev tools, and libtool, and the GNU autotools (automake,
autoconf, etc).
+ On Linux, you need the tools to create the native libraries: LZO headers,zlib headers, gcc,
OpenSSL headers, cmake, protobuf dev tools, and libtool, and the GNU autotools (automake,
autoconf, etc).
  
  For RHEL (and hence also CentOS):
  {{{
- yum -y install  lzo-devel  zlib-devel  gcc autoconf automake libtool 
+ yum -y install  lzo-devel  zlib-devel  gcc autoconf automake libtool
  }}}
  
  For Debian and Ubuntu:
@@ -51, +48 @@

  apt-get -y install maven build-essential autoconf automake libtool cmake zlib1g-dev pkg-config
libssl-dev
  }}}
  
+ Native libraries are mandatory for Windows. For instructions see Hadoop2OnWindows.
+ 
- ==== Hardware Setup ====
+ === Hardware Setup ===
- 
   * Lots of RAM, especially if you are using a modern IDE. ECC RAM is recommended in large-RAM
systems.
   * Disk Space. Always handy.
   * Network Connectivity. Hadoop tests are not guaranteed to all work if a machine does not
have a network connection -and especially if it does not know its own name.
   * Keep your computer's clock up to date via an NTP server, and set up the time zone correctly.
This is good for avoiding change-log confusion.
  
- === Getting the source code ===
- First of all, you need the Hadoop source code. The official location for Hadoop is the Apache
Git repository. See GitAndHadoop
- 
- === Building ProtocolBuffers (for 0.23+) ===
- 
- Hadoop 0.23+ must have Google's ProtocolBuffers for compilation to work. These are native
binaries which need to be downloaded, compiled and then installed locally.  See [[https://git-wip-us.apache.org/repos/asf?p=hadoop.git;a=blob_plain;f=BUILDING.txt;hb=HEAD|BUILDING.txt]].

- 
- This is a good opportunity to get the GNU C/C++ toolchain installed, which is useful for
working on the native code used in the HDFS project.
- 
- To install and use ProtocolBuffers
- 
- ==== Unix ====
- 
- Install the protobuf packages ''provided they are current enough'' -see the README file
for the current version. If they are too old, uninstall any version you have and follow the
instructions.
- 
- ==== Local build and installation ====
- 
-  * you need a copy of GCC 4.1+ including the {{{g++}}} C++ compiler, {{{make}}} and the
rest of the GNU C++ development chain.
-  * Linux: you need a copy of autoconf installed, which your local package manager will do
-along with automake.
-  * Download the version of protocol buffers that the BUILDING.txt recommends from [[http://code.google.com/p/protobuf/
|the protocol buffers project]].
-  * unzip it/untar it
-  * {{{cd}}} into the directory that has been created.
-  * run {{{./configure}}}
-  * If configure fails with "C++ preprocessor "/lib/cpp" fails sanity check" that means you
don't have g++ installed. Install it.
-  * run {{{make}}} to build the libraries.
-  * on a Unix system, after building the libraries, you must install it ''as root''. {{{su}}}
to root, then run {{{make install}}}
- 
- ==== Testing your Protocol Buffers installation ====
- 
- The test for this is verifying that {{{protoc}}} is on the command line. You should expect
something like
- 
- {{{
- $ protoc
- Missing input file.
- }}}
- 
- You may see the error message
- {{{
- $ protoc
- protoc: error while loading shared libraries: libprotobuf.so.7: cannot open shared object
file: No such file or directory
- }}}
- 
- This is a [[http://code.google.com/p/protobuf/issues/detail?id=213 |known issue]] for Linux,
and is caused by a stale cache of libraries. Run {{{ldconfig}}} and try again.
- 
- === Making Changes ===
+ == Making Changes ==
  Before you start, send a message to the [[http://hadoop.apache.org/core/mailing_lists.html|Hadoop
developer mailing list]], or file a bug report in [[Jira]].  Describe your proposed changes
and check that they fit in with what others are doing and have planned for the project.  Be
patient, it may take folks a while to understand your requirements.  If you want to start
with pre-existing issues, look for Jiras labeled `newbie`.
  
  Modify the source code and add some (very) nice features using your favorite IDE.<<BR>>
@@ -125, +79 @@

    * You can run all the Common unit tests with {{{mvn test}}}, or a specific unit test with
{{{mvn -Dtest=<class name without package prefix> test}}}. Run these commands from the
{{{hadoop-trunk}}} directory.
   * If you modify the Unix shell scripts, see the UnixShellScriptProgrammingGuide.
  
- ==== Using Maven ====
- Hadoop 0.23 and later is built using [[http://maven.apache.org/|Apache Maven]], version
3 or later.
- 
- Maven likes to download things, especially on the first run.
-  1. Be online for that first build, on a good network
-  1. To set the Maven proxy setttings, see http://maven.apache.org/guides/mini/guide-proxies.html
-  1. Because Maven doesn't pass proxy settings down to the Ant tasks it runs [[https://issues.apache.org/jira/browse/HDFS-2381|HDFS-2381]]
some parts of the Hadoop build may fail. The fix for this is to pass down the Ant proxy settings
in the build Unix: {{{mvn $ANT_OPTS}}}; windows   {{{mvn %ANT_OPTS%}}}.
-  1. Tomcat is always downloaded, even when building offline.  Setting {{{-Dtomcat.download.url}}}
to a local copy and {{{-Dtomcat.version}}} to the version pointed to by the URL will avoid
that download.
- 
  === Generating a patch ===
  ==== Choosing a target branch ====
  Except for the following situations it is recommended that all patches be based off trunk
to take advantage of the Jenkins pre-commit build.
@@ -214, +159 @@

  It's OK to upload a new patch to Jira with the same name as an existing patch. If you select
the "Activity>All" tab then the different versions are linked in the comment stream, providing
context. However many contributors find it convenient to add a numeric suffix to the patch
indicating the patch revision. e.g. hdfs-1234.01.patch, hdfs-1234.02.patch etc.
  
  
- ==== Testing your patch ====
+ === Testing your patch ===
  Before submitting your patch, you are encouraged to run the same tools that the automated
Jenkins patch test system will run on your patch.  This enables you to fix problems with your
patch before you submit it. The {{{dev-support/test-patch.sh}}} script in the trunk directory
will run your patch through the same checks that Hudson currently does ''except'' for executing
the unit tests.
  
  Run this command from a clean workspace (ie {{{git status}}} shows no modifications or additions)
as follows:
@@ -233, +178 @@

  
  Run the same command with no arguments to see the usage options.
  
- ==== Applying a patch ====
+ === Applying a patch ===
  To apply a patch either you generated or found from JIRA, you can issue
  
  {{{
- patch -p0 < cool_patch.patch
+ git apply -p0 cool_patch.patch
  }}}
- if you just want to check whether the patch applies you can run patch with --dry-run option
  
- {{{
- patch -p0 --dry-run < cool_patch.patch
- }}}
  If you are an Eclipse user, you can apply a patch by : 1. Right click project name in Package
Explorer , 2. Team -> Apply Patch
  
- ==== Changes that span projects ====
+ === Changes that span projects ===
  You may find that you need to modify both the common project and MapReduce or HDFS. Or perhaps
you have changed something in common, and need to verify that these changes do not break the
existing unit tests for HDFS and MapReduce. Hadoop's build system integrates with a local
maven repository to support cross-project development. Use this general workflow for your
development:
  
   * Make your changes in common
@@ -259, +200 @@

   * Switch to the dependent project and make any changes there (e.g., that rely on a new
API you introduced in hadoop-common).
   * Finally, create separate patches for your common and hdfs/mapred changes, and file them
as separate JIRA issues associated with the appropriate projects.
  
- === Contributing your work ===
+ == Contributing your work ==
  Finally, patches should be ''attached'' to an issue report in [[http://issues.apache.org/jira/browse/HADOOP|Jira]]
via the '''Attach File''' link on the issue's Jira. Please add a comment that asks for a code
review following our [[CodeReviewChecklist|code review checklist]]. Please note that the attachment
should be granted license to ASF for inclusion in ASF works (as per the [[http://www.apache.org/licenses/LICENSE-2.0|Apache
License]] ยง5).
  
  When you believe that your patch is ready to be committed, select the '''Submit Patch'''
link on the issue's Jira.  Submitted patches will be automatically tested against "trunk"
by [[http://hudson.zones.apache.org/hudson/view/Hadoop/|Hudson]], the project's continuous
integration engine.  Upon test completion, Hudson will add a success ("+1") message or failure
("-1") to your issue report in Jira.  If your issue contains multiple patch versions, Hudson
tests the last patch uploaded.

Mime
View raw message