Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by SuzanneMatthews:
## Please edit system and help pages ONLY in the moinmaster wiki! For more
## information, please see MoinMaster:MoinPagesEditorGroup.
##acl MoinPagesEditorGroup:read,write,delete,revert All:read
These instructions are for installing and running Hadoop on a OS X single node cluster (Mac Pro). This tutorial follows the same format and largely the same steps of the incredibly thorough and well-written tutorial by Michael Noll about Ubuntu cluster setup. This is pretty much his procedure with changes made for OS X users. I also added other things that I was able to piece together after looking up things from the Hadoop Quickstart and the forums/archives.
== Step 1: Creating a designated hadoop user on your system ==
This isn't -entirely- necessary, but it's a good idea for security reasons.
To add a user, go to:
System Preferences > Accounts
Click the "+" button near the bottom of the account list. You may need to unlock this ability by hitting the lock icon at the bottom corner and entering the admin username and password.
When the New account window comes out enter a name, as short name and a password. I entered the following:
Short name: Hadoop
Password: MyPassword (well you get the idea)
Once you are done, hit "create account".
Now, log in as the hadoop user. You are ready to set up everything!
== Step 2: Install/Configure Preliminary Software ==
Before installing Hadoop, there are several things that you need make sure you have on your system.
1. Java, and the latest version of the JDK
Because OS X is awesome, you actually don't have to install these things. However, you will have to enable and update what you have. Let's start with Java:
=== Updating Java ===
Open up the Terminal application. If it's not already on your dock, you can access it through
Applications > Utilities > Terminal
Next check to see the version of Java that's currently available on the system:
$:~ java -version
java version "1.5.0_13"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_13-b05-237)
Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing)
You may want to update this to Java Sun 6, which is available as an update for OS X 10.5 (Update 1). It's currently only available for 64-bit machines though. You can download it here.
After you download and install the update, you are going to need to configure Java on your system so the default points to this new update.
Applications > Utilities > Java > Java Preferences
Under "Java Version" hit the radio button next to "Java SE 6"
Down by "Java Application Runtime Settings" change the order so Java SE 6 (64 bit) is first, followed by Java SE 5 (64 bit) and so on.
Hit "Save" and close this window.
Now, when you go to the terminal, and type in "java -version" you should get the following:
$:~ java -version
java version "1.6.0_05"
Java(TM) SE Runtime Environment (build 1.6.0_05-b13-120)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_05-b13-52, mixed mode)
and for "javac -version":
$:~ javac -version
=== SSH: Setting up Remote Desktop and Enabling Self-Login ===
SSH also comes installed on your Mac. However, you need to enable access to your own machine (so hadoop doesn't ask you for a password at inconvenient times).
To do this, go to System Preferences > Sharing (under Internet & Network)
Under the list of services, check "Remote Login". For extra security, you can hit the radio button for "Only these Users" and select hadoop
Now, we're going to configure things so we can log into localhost without being asked for a password. Type the following into the terminal:
$:~ ssh-keygen -t rsa -P ""
$:~ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$:~ ssh localhost
You should be able to log in without a problem.
You are now ready to install Hadoop. Let's go to step 3!
== Step 3: Downloading and Installing Hadoop ==
So this actually involves several smaller steps:
1. Downloading and Unpacking Hadoop
2. Configuring Hadoop
After we finish these, you should be ready to go! So let's get started:
=== Downloading and Unpacking Hadoop ===
Download Hadoop. Make sure you download the latest version (as of this blogpost, 0.17.2 and 0.18.0 are the latest versions). We call our generic version of hadoop hadoop-* in this tutorial.
Unpack the hadoop-*.tar.gz in the directory of your choice. I placed mine in /Users/hadoop. You may also want to set ownership permissions for the directory:
$:~ tar -xzvf hadoop-*.tar.gz
$:~ chown -R hadoop hadoop-*
=== Configuring Hadoop ===
There are two files that we want to modify when we configure Hadoop. The first is conf/hadoop-env.sh . Open this in nano or your favorite text editor and do the following:
- uncomment the export JAVA_HOME line and set it to /Library/Java/Home
- uncomment the export HADOOP_HEAPSIZE line and keep it at 2000
You may want to change other settings as well, but I chose to leave the rest of hadoop-env.sh the same. Here is an idea of what part of mine looks like:
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use. Required.
# Extra Java CLASSPATH elements. Optional.
# export HADOOP_CLASSPATH=
# The maximum amount of heap to use, in MB. Default is 1000.
The next part that we need to set up is hadoop-site.xml. The most important parts to set here are hadoop.tmp.dir (which should be set to the directory of your choice) and to add mapred.tasktracker.maximum property to the file. This will effectively set the maximum number of tasks that can simulataneously run by a task tracker. You should also set dfs.replication 's value to one.
Below is a sample hadoop-site.xml file: