hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "QuickStart" by GlenMazza
Date Thu, 29 Nov 2012 01:50:04 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "QuickStart" page has been changed by GlenMazza:

Removed duplicate information already available on the Hadoop Site, providing links instead
to that information (remaining Apache information should eventually be incorporated into the

   * [[http://www.cloudera.com/hadoop-deb|Debian Packages for Debian based systems]] (Debian,
Ubuntu, etc)
   * [[http://www.cloudera.com/hadoop-ec2|AMI for Amazon EC2]]
- If you want to work exclusively with Hadoop code directly from Apache, the rest of this
document can help you get started quickly from there.
+ If you want to work exclusively with Hadoop code directly from Apache, the following articles
from the website will be most useful:
+  * [[http://hadoop.apache.org/docs/stable/single_node_setup.html|Single-Node Setup]]
+  * [[http://hadoop.apache.org/docs/stable/cluster_setup.html|Cluster Setup]]
+ Note for the above Apache links, if you're having trouble getting "ssh localhost" to work
on the following OS's:
- The instructions below are
- based on the docs found at the [[http://hadoop.apache.org/common/docs/current/cluster_setup.html#Configurationml
| Hadoop Cluster Setup/Configuration]].
- Please note the instructions were last updated to match Release 0.21.0. Things may have
changed since then. If they have, please update this page.
- == Requirements ==
-  * Java 1.6+ (see HadoopJavaVersions for 1.6.X version details)
-  * ssh and sshd
-  * rsync
- == Preparatory Steps ==
- Download
- '''Release Versions:'''
- can be found here http://hadoop.apache.org/core/releases.html
- '''Subversion:'''
- First check that the currently build isn't borked
- http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/
- Then grab the latest with subversion 
- {{{svn co http://svn.apache.org/repos/asf/hadoop/core/trunk hadoop}}}
- run the following commands:
- {{{
- cd hadoop
- ant 
- ant examples
- bin/hadoop
- }}}
- `bin/hadoop` should display the basic command line help docs and let you know it's at least
basically working. If any of the above steps failed use subversion to roll back to an earlier
days revision.
- == Stage 1: Standalone Operation ==
- By default, Hadoop is configured to run things in a non-distributed mode, as a single Java
process. This is useful for debugging, and can be demonstrated as follows:
- {{{
- mkdir input
- cp conf/*.xml input
- bin/hadoop jar hadoop-mapred-examples-0.21.0.jar grep input output 'security[a-z.]+'
- cat output/*
- }}}
- Obviously the version number on the jar may have changed by the time you read this. You
should see a lot of INFO level logging commands go by when you run it and cat output/* should
give you something that looks like this:
- {{{
- cat output/*
- 1	security.task.umbilical.protocol.acl
- 1	security.refresh.policy.protocol.acl
- 1	security.namenode.protocol.acl
- 1	security.job.submission.protocol.acl
- 1	security.inter.tracker.protocol.acl
- 1	security.inter.datanode.protocol.acl
- 1	security.datanode.protocol.acl
- ...(and so on)
- }}}
- If you saw the error `Exception in thread "main" java.lang.NoClassDefFoundError: hadoop-mapred-examples-0/21/0/jar`
it means you forgot to type `jar` after `bin/hadoop` If you were unable to run this example,
roll back to a previous night's version. If it seemed to run fine but cat didn't spit anything
out you probably mistyped something. Try copying the command directly from the wiki to avoid
typos. You'll need to wipe out the output directory between each run.
- Congratulations you have just successfully run your first MapReduce with Hadoop.
- == Stage 2: Pseudo-distributed Configuration ==
- You can in fact run everything on a single host. To run things this way, put the following
in `conf/hdfs-site.xml` (`conf/hadoop-site.xml` in versions < 0.20)
- {{{
- <configuration>
-   <property>
-     <name>fs.default.name</name>
-     <value>localhost:9000</value>
-   </property>
-   <property>
-     <name>mapred.job.tracker</name>
-     <value>localhost:9001</value>
-   </property>
-   <property>
-     <name>dfs.replication</name>
-     <value>1</value>
- 	<!-- set to 1 to reduce warnings when 
- 	running on a single node -->
-   </property>
- </configuration>
- }}}
- Now check that the command 
- `ssh localhost`
- does not require a password. If it does, set up passwordless ssh. For example, you can execute
the following commands:
- {{{
- ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
- cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
- }}}
- Now, try `ssh localhost` again. If this doesn't work you're doing to have to figure out
what's going on with your `ssh-agent` on your own.
  '''Window Users''' To start ssh server, you need run "ssh-host-config -y" in cygwin environment.
If he ask for CYGWIN environment value, set it to "ntsec tty". After you can run server from
cygwin "cygrunsrv --start sshd" or from Windows command line "net start sshd".
  '''Mac Users''' In recent versions of OSX, ssh-agent is already set up with launchd and
keychain. This can be verified by executing "echo $SSH_AUTH_SOCK" in your favorite shell.
You can use ssh-add -k and -K to add your keys and passphrases to your keychain.
+ Multi-node cluster setup is largely similar to single-node (pseudo-distributed) setup, except
for the following:
- === Bootstrapping ===
- A new distributed filesystem must be formatted with the following command, run on the master
- {{{bin/hadoop namenode -format}}}
- If asked to [re]format, you must reply Y (not just y) if you want to reformat, else Hadoop
will abort the format.
- You should see a quick series of `STARTUP_MSG`s and a `SHUTDOWN_MSG`
- Open the {{{conf/hadoop-env.sh}}} file and define {{{JAVA_HOME}}} in it.
- Then start up the Hadoop daemon with 
- {{{bin/start-all.sh}}}
- It should notify you that it's starting the `namenode`, `datanode`, `secondarynamenode`,
and `jobtracker`. 
- Input files are copied into the distributed filesystem as follows: 
- {{{bin/hadoop dfs -put <localsrc> <dst>}}}
- For more details just type `bin/hadoop dfs` with no options.
- To shutdown:
- {{{bin/stop-all.sh}}}
- === Browsing to the Services ===
- Once the Pseudo Distributed cluster is live, you can point your web browser at it, by connecting
to localhost at the chosen ports. 
- If you have left the values at their defaults, the page PseudoDistributedHadoop provides
short cuts to these pages. 
- == Stage 3: Fully-distributed operation ==
- Fully distributed operation is just like the pseudo-distributed operation described above,
except, specify:
   1. The hostname or IP address of your master server in the value for fs.default.name, as
hdfs://master.example.com/ in conf/core-site.xml.
   1. The host and port of the your master server in the value of mapred.job.tracker as master.example.com:port
in conf/mapred-site.xml.
@@ -153, +31 @@

   1. mapred.map.tasks and mapred.reduce.tasks in conf/mapred-site.xml. As a rule of thumb,
use 10x the number of slave processors for mapred.map.tasks, and 2x the number of slave processors
for mapred.reduce.tasks.
   1. Finally, list all slave hostnames or IP addresses in your conf/slaves file, one per
line. Then format your filesystem and start your cluster on your master node, as above.
- See [[http://hadoop.apache.org/common/docs/current/cluster_setup.html#Configurationml |
Hadoop Cluster Setup/Configuration]] for details.
+ See [[http://hadoop.apache.org/common/docs/stable/cluster_setup.html#Configurationml | Hadoop
Cluster Setup/Configuration]] for details.

View raw message