Return-Path: X-Original-To: apmail-hadoop-common-commits-archive@www.apache.org Delivered-To: apmail-hadoop-common-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A7BA717306 for ; Tue, 13 Jan 2015 17:50:07 +0000 (UTC) Received: (qmail 40358 invoked by uid 500); 13 Jan 2015 17:50:03 -0000 Delivered-To: apmail-hadoop-common-commits-archive@hadoop.apache.org Received: (qmail 40126 invoked by uid 500); 13 Jan 2015 17:50:03 -0000 Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-commits@hadoop.apache.org Received: (qmail 40074 invoked by uid 99); 13 Jan 2015 17:50:02 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Jan 2015 17:50:02 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id AD2EB9AF7C3; Tue, 13 Jan 2015 17:50:02 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: cnauroth@apache.org To: common-commits@hadoop.apache.org Date: Tue, 13 Jan 2015 17:50:02 -0000 Message-Id: X-Mailer: ASF-Git Admin Mailer Subject: [01/50] [abbrv] hadoop git commit: HADOOP-10908. Common needs updates for shell rewrite (aw) Repository: hadoop Updated Branches: refs/heads/HDFS-6994 2e42564ad -> a607429b5 HADOOP-10908. Common needs updates for shell rewrite (aw) Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/94d342e6 Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/94d342e6 Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/94d342e6 Branch: refs/heads/HDFS-6994 Commit: 94d342e607e1db317bae7af86a34ae7cd3860348 Parents: 41d72cb Author: Allen Wittenauer Authored: Mon Jan 5 14:26:41 2015 -0800 Committer: Allen Wittenauer Committed: Mon Jan 5 14:26:41 2015 -0800 ---------------------------------------------------------------------- hadoop-common-project/hadoop-common/CHANGES.txt | 2 + .../src/site/apt/ClusterSetup.apt.vm | 348 ++++++++----------- .../src/site/apt/CommandsManual.apt.vm | 316 +++++++++-------- .../src/site/apt/FileSystemShell.apt.vm | 313 ++++++++++------- .../src/site/apt/SingleCluster.apt.vm | 20 +- 5 files changed, 534 insertions(+), 465 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/hadoop/blob/94d342e6/hadoop-common-project/hadoop-common/CHANGES.txt ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/CHANGES.txt b/hadoop-common-project/hadoop-common/CHANGES.txt index 0c76894..40e8d29 100644 --- a/hadoop-common-project/hadoop-common/CHANGES.txt +++ b/hadoop-common-project/hadoop-common/CHANGES.txt @@ -344,6 +344,8 @@ Trunk (Unreleased) HADOOP-11397. Can't override HADOOP_IDENT_STRING (Kengo Seki via aw) + HADOOP-10908. Common needs updates for shell rewrite (aw) + OPTIMIZATIONS HADOOP-7761. Improve the performance of raw comparisons. (todd) http://git-wip-us.apache.org/repos/asf/hadoop/blob/94d342e6/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm index f5f1deb..52b0552 100644 --- a/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm +++ b/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm @@ -11,83 +11,81 @@ ~~ limitations under the License. See accompanying LICENSE file. --- - Hadoop Map Reduce Next Generation-${project.version} - Cluster Setup + Hadoop ${project.version} - Cluster Setup --- --- ${maven.build.timestamp} %{toc|section=1|fromDepth=0} -Hadoop MapReduce Next Generation - Cluster Setup +Hadoop Cluster Setup * {Purpose} - This document describes how to install, configure and manage non-trivial + This document describes how to install and configure Hadoop clusters ranging from a few nodes to extremely large clusters - with thousands of nodes. + with thousands of nodes. To play with Hadoop, you may first want to + install it on a single machine (see {{{./SingleCluster.html}Single Node Setup}}). - To play with Hadoop, you may first want to install it on a single - machine (see {{{./SingleCluster.html}Single Node Setup}}). + This document does not cover advanced topics such as {{{./SecureMode.html}Security}} or + High Availability. * {Prerequisites} - Download a stable version of Hadoop from Apache mirrors. + * Install Java. See the {{{http://wiki.apache.org/hadoop/HadoopJavaVersions}Hadoop Wiki}} for known good versions. + * Download a stable version of Hadoop from Apache mirrors. * {Installation} Installing a Hadoop cluster typically involves unpacking the software on all - the machines in the cluster or installing RPMs. + the machines in the cluster or installing it via a packaging system as + appropriate for your operating system. It is important to divide up the hardware + into functions. Typically one machine in the cluster is designated as the NameNode and - another machine the as ResourceManager, exclusively. These are the masters. + another machine the as ResourceManager, exclusively. These are the masters. Other + services (such as Web App Proxy Server and MapReduce Job History server) are usually + run either on dedicated hardware or on shared infrastrucutre, depending upon the load. The rest of the machines in the cluster act as both DataNode and NodeManager. These are the slaves. -* {Running Hadoop in Non-Secure Mode} +* {Configuring Hadoop in Non-Secure Mode} - The following sections describe how to configure a Hadoop cluster. - - {Configuration Files} - - Hadoop configuration is driven by two types of important configuration files: + Hadoop's Java configuration is driven by two types of important configuration files: * Read-only default configuration - <<>>, <<>>, <<>> and <<>>. - * Site-specific configuration - <>, - <>, <> and - <>. - + * Site-specific configuration - <<>>, + <<>>, <<>> and + <<>>. - Additionally, you can control the Hadoop scripts found in the bin/ - directory of the distribution, by setting site-specific values via the - <> and <>. - {Site Configuration} + Additionally, you can control the Hadoop scripts found in the bin/ + directory of the distribution, by setting site-specific values via the + <<>> and <<>>. To configure the Hadoop cluster you will need to configure the <<>> in which the Hadoop daemons execute as well as the <<>> for the Hadoop daemons. - The Hadoop daemons are NameNode/DataNode and ResourceManager/NodeManager. + HDFS daemons are NameNode, SecondaryNameNode, and DataNode. YARN damones + are ResourceManager, NodeManager, and WebAppProxy. If MapReduce is to be + used, then the MapReduce Job History Server will also be running. For + large installations, these are generally running on separate hosts. ** {Configuring Environment of Hadoop Daemons} - Administrators should use the <> and - <> script to do site-specific customization of the - Hadoop daemons' process environment. + Administrators should use the <<>> and optionally the + <<>> and <<>> scripts to do + site-specific customization of the Hadoop daemons' process environment. - At the very least you should specify the <<>> so that it is + At the very least, you must specify the <<>> so that it is correctly defined on each remote node. - In most cases you should also specify <<>> and - <<>> to point to directories that can only be - written to by the users that are going to run the hadoop daemons. - Otherwise there is the potential for a symlink attack. - Administrators can configure individual daemons using the configuration options shown below in the table: @@ -114,20 +112,42 @@ Hadoop MapReduce Next Generation - Cluster Setup statement should be added in hadoop-env.sh : ---- - export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC ${HADOOP_NAMENODE_OPTS}" + export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC" ---- + See <<>> for other examples. + Other useful configuration parameters that you can customize include: - * <<>> / <<>> - The directory where the - daemons' log files are stored. They are automatically created if they - don't exist. + * <<>> - The directory where the + daemons' process id files are stored. + + * <<>> - The directory where the + daemons' log files are stored. Log files are automatically created + if they don't exist. + + * <<>> - The maximum amount of + memory to use for the Java heapsize. Units supported by the JVM + are also supported here. If no unit is present, it will be assumed + the number is in megabytes. By default, Hadoop will let the JVM + determine how much to use. This value can be overriden on + a per-daemon basis using the appropriate <<<_OPTS>>> variable listed above. + For example, setting <<>> and + <<>> will configure the NameNode with 5GB heap. + + In most cases, you should specify the <<>> and + <<>> directories such that they can only be + written to by the users that are going to run the hadoop daemons. + Otherwise there is the potential for a symlink attack. + + It is also traditional to configure <<>> in the system-wide + shell environment configuration. For example, a simple script inside + <<>>: - * <<>> / <<>> - The maximum amount of - heapsize to use, in MB e.g. if the varibale is set to 1000 the heap - will be set to 1000MB. This is used to configure the heap - size for the daemon. By default, the value is 1000. If you want to - configure the values separately for each deamon you can use. +--- + HADOOP_PREFIX=/path/to/hadoop + export HADOOP_PREFIX +--- *--------------------------------------+--------------------------------------+ || Daemon || Environment Variable | @@ -141,12 +161,12 @@ Hadoop MapReduce Next Generation - Cluster Setup | Map Reduce Job History Server | HADOOP_JOB_HISTORYSERVER_HEAPSIZE | *--------------------------------------+--------------------------------------+ -** {Configuring the Hadoop Daemons in Non-Secure Mode} +** {Configuring the Hadoop Daemons} This section deals with important parameters to be specified in the given configuration files: - * <<>> + * <<>> *-------------------------+-------------------------+------------------------+ || Parameter || Value || Notes | @@ -157,7 +177,7 @@ Hadoop MapReduce Next Generation - Cluster Setup | | | Size of read/write buffer used in SequenceFiles. | *-------------------------+-------------------------+------------------------+ - * <<>> + * <<>> * Configurations for NameNode: @@ -195,7 +215,7 @@ Hadoop MapReduce Next Generation - Cluster Setup | | | stored in all named directories, typically on different devices. | *-------------------------+-------------------------+------------------------+ - * <<>> + * <<>> * Configurations for ResourceManager and NodeManager: @@ -341,9 +361,7 @@ Hadoop MapReduce Next Generation - Cluster Setup | | | Be careful, set this too small and you will spam the name node. | *-------------------------+-------------------------+------------------------+ - - - * <<>> + * <<>> * Configurations for MapReduce Applications: @@ -395,22 +413,6 @@ Hadoop MapReduce Next Generation - Cluster Setup | | | Directory where history files are managed by the MR JobHistory Server. | *-------------------------+-------------------------+------------------------+ -* {Hadoop Rack Awareness} - - The HDFS and the YARN components are rack-aware. - - The NameNode and the ResourceManager obtains the rack information of the - slaves in the cluster by invoking an API in an administrator - configured module. - - The API resolves the DNS name (also IP address) to a rack id. - - The site-specific module to use can be configured using the configuration - item <<>>. The default implementation - of the same runs a script/command configured using - <<>>. If <<>> is - not set, the rack id is returned for any passed IP address. - * {Monitoring Health of NodeManagers} Hadoop provides a mechanism by which administrators can configure the @@ -433,7 +435,7 @@ Hadoop MapReduce Next Generation - Cluster Setup node was healthy is also displayed on the web interface. The following parameters can be used to control the node health - monitoring script in <<>>. + monitoring script in <<>>. *-------------------------+-------------------------+------------------------+ || Parameter || Value || Notes | @@ -465,224 +467,170 @@ Hadoop MapReduce Next Generation - Cluster Setup disk is either raided or a failure in the boot disk is identified by the health checker script. -* {Slaves file} +* {Slaves File} - Typically you choose one machine in the cluster to act as the NameNode and - one machine as to act as the ResourceManager, exclusively. The rest of the - machines act as both a DataNode and NodeManager and are referred to as - . + List all slave hostnames or IP addresses in your <<>> + file, one per line. Helper scripts (described below) will use the + <<>> file to run commands on many hosts at once. It is not + used for any of the Java-based Hadoop configuration. In order + to use this functionality, ssh trusts (via either passphraseless ssh or + some other means, such as Kerberos) must be established for the accounts + used to run Hadoop. - List all slave hostnames or IP addresses in your <<>> file, - one per line. +* {Hadoop Rack Awareness} + + Many Hadoop components are rack-aware and take advantage of the + network topology for performance and safety. Hadoop daemons obtain the + rack information of the slaves in the cluster by invoking an administrator + configured module. See the {{{./RackAwareness.html}Rack Awareness}} + documentation for more specific information. + + It is highly recommended configuring rack awareness prior to starting HDFS. * {Logging} - Hadoop uses the Apache log4j via the Apache Commons Logging framework for - logging. Edit the <<>> file to customize the + Hadoop uses the {{{http://logging.apache.org/log4j/2.x/}Apache log4j}} via the Apache Commons Logging framework for + logging. Edit the <<>> file to customize the Hadoop daemons' logging configuration (log-formats and so on). * {Operating the Hadoop Cluster} Once all the necessary configuration is complete, distribute the files to the - <<>> directory on all the machines. + <<>> directory on all the machines. This should be the + same directory on all machines. + + In general, it is recommended that HDFS and YARN run as separate users. + In the majority of installations, HDFS processes execute as 'hdfs'. YARN + is typically using the 'yarn' account. ** Hadoop Startup - To start a Hadoop cluster you will need to start both the HDFS and YARN - cluster. + To start a Hadoop cluster you will need to start both the HDFS and YARN + cluster. - Format a new distributed filesystem: + The first time you bring up HDFS, it must be formatted. Format a new + distributed filesystem as : ---- -$ $HADOOP_PREFIX/bin/hdfs namenode -format +[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format ---- - Start the HDFS with the following command, run on the designated NameNode: + Start the HDFS NameNode with the following command on the + designated node as : ---- -$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode ----- - - Run a script to start DataNodes on all slaves: - +[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start namenode ---- -$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode ----- - Start the YARN with the following command, run on the designated - ResourceManager: + Start a HDFS DataNode with the following command on each + designated node as : ---- -$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager ----- - - Run a script to start NodeManagers on all slaves: - +[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start datanode ---- -$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager ----- - Start a standalone WebAppProxy server. If multiple servers - are used with load balancing it should be run on each of them: + If <<>> and ssh trusted access is configured + (see {{{./SingleCluster.html}Single Node Setup}}), all of the + HDFS processes can be started with a utility script. As : ---- -$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start proxyserver --config $HADOOP_CONF_DIR +[hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh ---- - Start the MapReduce JobHistory Server with the following command, run on the - designated server: + Start the YARN with the following command, run on the designated + ResourceManager as : ---- -$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR ----- - -** Hadoop Shutdown - - Stop the NameNode with the following command, run on the designated - NameNode: - +[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start resourcemanager ---- -$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode ----- - Run a script to stop DataNodes on all slaves: + Run a script to start a NodeManager on each designated host as : ---- -$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode ----- - - Stop the ResourceManager with the following command, run on the designated - ResourceManager: - +[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start nodemanager ---- -$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager ----- - Run a script to stop NodeManagers on all slaves: + Start a standalone WebAppProxy server. Run on the WebAppProxy + server as . If multiple servers are used with load balancing + it should be run on each of them: ---- -$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager ----- +[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon start proxyserver +---- - Stop the WebAppProxy server. If multiple servers are used with load - balancing it should be run on each of them: + If <<>> and ssh trusted access is configured + (see {{{./SingleCluster.html}Single Node Setup}}), all of the + YARN processes can be started with a utility script. As : ---- -$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh stop proxyserver --config $HADOOP_CONF_DIR +[yarn]$ $HADOOP_PREFIX/sbin/start-yarn.sh ---- - - Stop the MapReduce JobHistory Server with the following command, run on the - designated server: + Start the MapReduce JobHistory Server with the following command, run + on the designated server as : ---- -$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR ----- - - -* {Operating the Hadoop Cluster} - - Once all the necessary configuration is complete, distribute the files to the - <<>> directory on all the machines. - - This section also describes the various Unix users who should be starting the - various components and uses the same Unix accounts and groups used previously: - -** Hadoop Startup +[mapred]$ $HADOOP_PREFIX/bin/mapred --daemon start historyserver +---- - To start a Hadoop cluster you will need to start both the HDFS and YARN - cluster. +** Hadoop Shutdown - Format a new distributed filesystem as : + Stop the NameNode with the following command, run on the designated NameNode + as : ---- -[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format +[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon stop namenode ---- - Start the HDFS with the following command, run on the designated NameNode - as : + Run a script to stop a DataNode as : ---- -[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode ----- - - Run a script to start DataNodes on all slaves as with a special - environment variable <<>> set to : - +[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon stop datanode ---- -[root]$ HADOOP_SECURE_DN_USER=hdfs $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode ----- - Start the YARN with the following command, run on the designated - ResourceManager as : + If <<>> and ssh trusted access is configured + (see {{{./SingleCluster.html}Single Node Setup}}), all of the + HDFS processes may be stopped with a utility script. As : ---- -[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager ----- - - Run a script to start NodeManagers on all slaves as : - +[hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh ---- -[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager ----- - Start a standalone WebAppProxy server. Run on the WebAppProxy - server as . If multiple servers are used with load balancing - it should be run on each of them: + Stop the ResourceManager with the following command, run on the designated + ResourceManager as : ---- -[yarn]$ $HADOOP_YARN_HOME/bin/yarn start proxyserver --config $HADOOP_CONF_DIR ----- - - Start the MapReduce JobHistory Server with the following command, run on the - designated server as : - +[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon stop resourcemanager ---- -[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR ----- -** Hadoop Shutdown - - Stop the NameNode with the following command, run on the designated NameNode - as : + Run a script to stop a NodeManager on a slave as : ---- -[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode ----- - - Run a script to stop DataNodes on all slaves as : - +[yarn]$ $HADOOP_PREFIX/bin/yarn --daemon stop nodemanager ---- -[root]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode ----- - Stop the ResourceManager with the following command, run on the designated - ResourceManager as : + If <<>> and ssh trusted access is configured + (see {{{./SingleCluster.html}Single Node Setup}}), all of the + YARN processes can be stopped with a utility script. As : ---- -[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager ----- - - Run a script to stop NodeManagers on all slaves as : - +[yarn]$ $HADOOP_PREFIX/sbin/stop-yarn.sh ---- -[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager ----- Stop the WebAppProxy server. Run on the WebAppProxy server as . If multiple servers are used with load balancing it should be run on each of them: ---- -[yarn]$ $HADOOP_YARN_HOME/bin/yarn stop proxyserver --config $HADOOP_CONF_DIR +[yarn]$ $HADOOP_PREFIX/bin/yarn stop proxyserver ---- Stop the MapReduce JobHistory Server with the following command, run on the designated server as : ---- -[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR ----- +[mapred]$ $HADOOP_PREFIX/bin/mapred --daemon stop historyserver +---- * {Web Interfaces} http://git-wip-us.apache.org/repos/asf/hadoop/blob/94d342e6/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm index 6d2fd5e..67c8bc3 100644 --- a/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm +++ b/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm @@ -21,102 +21,161 @@ %{toc} -Overview +Hadoop Commands Guide - All hadoop commands are invoked by the <<>> script. Running the - hadoop script without any arguments prints the description for all - commands. +* Overview - Usage: <<>> + All of the Hadoop commands and subprojects follow the same basic structure: - Hadoop has an option parsing framework that employs parsing generic - options as well as running classes. + Usage: <<>> +*--------+---------+ +|| FIELD || Description *-----------------------+---------------+ -|| COMMAND_OPTION || Description +| shellcommand | The command of the project being invoked. For example, + | Hadoop common uses <<>>, HDFS uses <<>>, + | and YARN uses <<>>. +*---------------+-------------------+ +| SHELL_OPTIONS | Options that the shell processes prior to executing Java. *-----------------------+---------------+ -| <<<--config confdir>>>| Overwrites the default Configuration directory. Default is <<<${HADOOP_HOME}/conf>>>. +| COMMAND | Action to perform. *-----------------------+---------------+ -| <<<--loglevel loglevel>>>| Overwrites the log level. Valid log levels are -| | FATAL, ERROR, WARN, INFO, DEBUG, and TRACE. -| | Default is INFO. +| GENERIC_OPTIONS | The common set of options supported by + | multiple commands. *-----------------------+---------------+ -| GENERIC_OPTIONS | The common set of options supported by multiple commands. -| COMMAND_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into User Commands and Administration Commands. +| COMMAND_OPTIONS | Various commands with their options are + | described in this documention for the + | Hadoop common sub-project. HDFS and YARN are + | covered in other documents. *-----------------------+---------------+ -Generic Options +** {Shell Options} - The following options are supported by {{dfsadmin}}, {{fs}}, {{fsck}}, - {{job}} and {{fetchdt}}. Applications should implement - {{{../../api/org/apache/hadoop/util/Tool.html}Tool}} to support - GenericOptions. + All of the shell commands will accept a common set of options. For some commands, + these options are ignored. For example, passing <<<---hostnames>>> on a + command that only executes on a single host will be ignored. + +*-----------------------+---------------+ +|| SHELL_OPTION || Description +*-----------------------+---------------+ +| <<<--buildpaths>>> | Enables developer versions of jars. +*-----------------------+---------------+ +| <<<--config confdir>>> | Overwrites the default Configuration + | directory. Default is <<<${HADOOP_PREFIX}/conf>>>. +*-----------------------+----------------+ +| <<<--daemon mode>>> | If the command supports daemonization (e.g., + | <<>>), execute in the appropriate + | mode. Supported modes are <<>> to start the + | process in daemon mode, <<>> to stop the + | process, and <<>> to determine the active + | status of the process. <<>> will return + | an {{{http://refspecs.linuxbase.org/LSB_3.0.0/LSB-generic/LSB-generic/iniscrptact.html}LSB-compliant}} result code. + | If no option is provided, commands that support + | daemonization will run in the foreground. +*-----------------------+---------------+ +| <<<--debug>>> | Enables shell level configuration debugging information +*-----------------------+---------------+ +| <<<--help>>> | Shell script usage information. +*-----------------------+---------------+ +| <<<--hostnames>>> | A space delimited list of hostnames where to execute + | a multi-host subcommand. By default, the content of + | the <<>> file is used. +*-----------------------+----------------+ +| <<<--hosts>>> | A file that contains a list of hostnames where to execute + | a multi-host subcommand. By default, the content of the + | <<>> file is used. +*-----------------------+----------------+ +| <<<--loglevel loglevel>>> | Overrides the log level. Valid log levels are +| | FATAL, ERROR, WARN, INFO, DEBUG, and TRACE. +| | Default is INFO. +*-----------------------+---------------+ + +** {Generic Options} + + Many subcommands honor a common set of configuration options to alter their behavior: *------------------------------------------------+-----------------------------+ || GENERIC_OPTION || Description *------------------------------------------------+-----------------------------+ +|<<<-archives \ >>> | Specify comma separated + | archives to be unarchived on + | the compute machines. Applies + | only to job. +*------------------------------------------------+-----------------------------+ |<<<-conf \ >>> | Specify an application | configuration file. *------------------------------------------------+-----------------------------+ |<<<-D \=\ >>> | Use value for given property. *------------------------------------------------+-----------------------------+ -|<<<-jt \ or \>>> | Specify a ResourceManager. - | Applies only to job. -*------------------------------------------------+-----------------------------+ |<<<-files \ >>> | Specify comma separated files | to be copied to the map | reduce cluster. Applies only | to job. *------------------------------------------------+-----------------------------+ +|<<<-jt \ or \>>> | Specify a ResourceManager. + | Applies only to job. +*------------------------------------------------+-----------------------------+ |<<<-libjars \ >>>| Specify comma separated jar | files to include in the | classpath. Applies only to | job. *------------------------------------------------+-----------------------------+ -|<<<-archives \ >>> | Specify comma separated - | archives to be unarchived on - | the compute machines. Applies - | only to job. -*------------------------------------------------+-----------------------------+ -User Commands +Hadoop Common Commands - Commands useful for users of a hadoop cluster. + All of these commands are executed from the <<>> shell command. They + have been broken up into {{User Commands}} and + {{Admininistration Commands}}. + +* User Commands -* <<>> + Commands useful for users of a hadoop cluster. +** <<>> + Creates a hadoop archive. More information can be found at - {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopArchives.html} + {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopArchives.html} Hadoop Archives Guide}}. -* <<>> +** <<>> - Command to manage credentials, passwords and secrets within credential providers. + Usage: <<>> - The CredentialProvider API in Hadoop allows for the separation of applications - and how they store their required passwords/secrets. In order to indicate - a particular provider type and location, the user must provide the - configuration element in core-site.xml - or use the command line option <<<-provider>>> on each of the following commands. - This provider path is a comma-separated list of URLs that indicates the type and - location of a list of providers that should be consulted. - For example, the following path: +*-----------------+-----------------------------------------------------------+ +|| COMMAND_OPTION || Description +*-----------------+-----------------------------------------------------------+ +| -a | Check all libraries are available. +*-----------------+-----------------------------------------------------------+ +| -h | print help +*-----------------+-----------------------------------------------------------+ - <<>> + This command checks the availability of the Hadoop native code. See + {{{NativeLibraries.html}}} for more information. By default, this command + only checks the availability of libhadoop. - indicates that the current user's credentials file should be consulted through - the User Provider, that the local file located at <<>> is a Java Keystore - Provider and that the file located within HDFS at <<>> - is also a store for a Java Keystore Provider. +** <<>> - When utilizing the credential command it will often be for provisioning a password - or secret to a particular credential store provider. In order to explicitly - indicate which provider store to use the <<<-provider>>> option should be used. Otherwise, - given a path of multiple providers, the first non-transient provider will be used. - This may or may not be the one that you intended. + Usage: <<|-h|--help]>>> - Example: <<<-provider jceks://file/tmp/test.jceks>>> +*-----------------+-----------------------------------------------------------+ +|| COMMAND_OPTION || Description +*-----------------+-----------------------------------------------------------+ +| --glob | expand wildcards +*-----------------+-----------------------------------------------------------+ +| --jar | write classpath as manifest in jar named +*-----------------+-----------------------------------------------------------+ +| -h, --help | print help +*-----------------+-----------------------------------------------------------+ + + Prints the class path needed to get the Hadoop jar and the required + libraries. If called without arguments, then prints the classpath set up by + the command scripts, which is likely to contain wildcards in the classpath + entries. Additional options print the classpath after wildcard expansion or + write the classpath into the manifest of a jar file. The latter is useful in + environments where wildcards cannot be used and the expanded classpath exceeds + the maximum supported command line length. + +** <<>> Usage: << [options]>>> @@ -143,109 +202,96 @@ User Commands | indicated. *-------------------+-------------------------------------------------------+ -* <<>> + Command to manage credentials, passwords and secrets within credential providers. - Copy file or directories recursively. More information can be found at - {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistCp.html} - Hadoop DistCp Guide}}. + The CredentialProvider API in Hadoop allows for the separation of applications + and how they store their required passwords/secrets. In order to indicate + a particular provider type and location, the user must provide the + configuration element in core-site.xml + or use the command line option <<<-provider>>> on each of the following commands. + This provider path is a comma-separated list of URLs that indicates the type and + location of a list of providers that should be consulted. For example, the following path: + <<>> -* <<>> + indicates that the current user's credentials file should be consulted through + the User Provider, that the local file located at <<>> is a Java Keystore + Provider and that the file located within HDFS at <<>> + is also a store for a Java Keystore Provider. - Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#dfs}<<>>}} - instead. + When utilizing the credential command it will often be for provisioning a password + or secret to a particular credential store provider. In order to explicitly + indicate which provider store to use the <<<-provider>>> option should be used. Otherwise, + given a path of multiple providers, the first non-transient provider will be used. + This may or may not be the one that you intended. -* <<>> + Example: <<<-provider jceks://file/tmp/test.jceks>>> - Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#fsck}<<>>}} - instead. +** <<>> -* <<>> + Usage: <<>> + +*-------------------+-------------------------------------------------------+ +||COMMAND_OPTION || Description +*-------------------+-------------------------------------------------------+ +| -f | List of objects to change +*----+------------+ +| -i | Ignore failures +*----+------------+ +| -log | Directory to log output +*-----+---------+ - Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#fetchdt} - <<>>}} instead. + Change the ownership and permissions on many files at once. -* <<>> +** <<>> - Runs a jar file. Users can bundle their Map Reduce code in a jar file and - execute it using this command. + Copy file or directories recursively. More information can be found at + {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistCp.html} + Hadoop DistCp Guide}}. - Usage: << [mainClass] args...>>> +** <<>> - The streaming jobs are run via this command. Examples can be referred from - Streaming examples + This command is documented in the {{{./FileSystemShell.html}File System Shell Guide}}. It is a synonym for <<>> when HDFS is in use. - Word count example is also run using jar command. It can be referred from - Wordcount example +** <<>> - Use {{{../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html#jar}<<>>}} - to launch YARN applications instead. + Usage: << [mainClass] args...>>> -* <<>> + Runs a jar file. + + Use {{{../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html#jar}<<>>}} + to launch YARN applications instead. - Deprecated. Use - {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html#job} - <<>>}} instead. +** <<>> -* <<>> + Usage: <<>> - Deprecated. Use - {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html#pipes} - <<>>}} instead. + Print the computed java.library.path. -* <<>> +** <<>> - Deprecated. Use - {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html#queue} - <<>>}} instead. + Manage keys via the KeyProvider. -* <<>> +** <<>> - Prints the version. + View and modify Hadoop tracing settings. See the {{{./Tracing.html}Tracing Guide}}. + +** <<>> Usage: <<>> -* <<>> + Prints the version. - hadoop script can be used to invoke any class. +** <<>> Usage: <<>> - Runs the class named <<>>. - -* <<>> - - Prints the class path needed to get the Hadoop jar and the required - libraries. If called without arguments, then prints the classpath set up by - the command scripts, which is likely to contain wildcards in the classpath - entries. Additional options print the classpath after wildcard expansion or - write the classpath into the manifest of a jar file. The latter is useful in - environments where wildcards cannot be used and the expanded classpath exceeds - the maximum supported command line length. + Runs the class named <<>>. The class must be part of a package. - Usage: <<|-h|--help]>>> - -*-----------------+-----------------------------------------------------------+ -|| COMMAND_OPTION || Description -*-----------------+-----------------------------------------------------------+ -| --glob | expand wildcards -*-----------------+-----------------------------------------------------------+ -| --jar | write classpath as manifest in jar named -*-----------------+-----------------------------------------------------------+ -| -h, --help | print help -*-----------------+-----------------------------------------------------------+ - -Administration Commands +* {Administration Commands} Commands useful for administrators of a hadoop cluster. -* <<>> - - Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#balancer} - <<>>}} instead. - -* <<>> - - Get/Set the log level for each daemon. +** <<>> Usage: << >>> Usage: << >>> @@ -262,22 +308,20 @@ Administration Commands | connects to http:///logLevel?log= *------------------------------+-----------------------------------------------------------+ -* <<>> + Get/Set the log level for each daemon. - Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#datanode} - <<>>}} instead. +* Files -* <<>> +** <> - Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#dfsadmin} - <<>>}} instead. + This file stores the global settings used by all Hadoop shell commands. -* <<>> +** <> - Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#namenode} - <<>>}} instead. + This file allows for advanced users to override some shell functionality. -* <<>> +** <<~/.hadooprc>> - Deprecated, use {{{../hadoop-hdfs/HDFSCommands.html#secondarynamenode} - <<>>}} instead. + This stores the personal environment for an individual user. It is + processed after the hadoop-env.sh and hadoop-user-functions.sh files + and can contain the same settings. http://git-wip-us.apache.org/repos/asf/hadoop/blob/94d342e6/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm index 1a9618c..757a0ba 100644 --- a/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm +++ b/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm @@ -45,46 +45,62 @@ bin/hadoop fs Differences are described with each of the commands. Error information is sent to stderr and the output is sent to stdout. -appendToFile + If HDFS is being used, <<>> is a synonym. - Usage: << ... >>> + See the {{{./CommandsManual.html}Commands Manual}} for generic shell options. + +* appendToFile + + Usage: << ... >>> Append single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and appends to destination file system. - * <<>> + * <<>> - * <<>> + * <<>> - * <<>> + * <<>> - * <<>> + * <<>> Reads the input from stdin. Exit Code: Returns 0 on success and 1 on error. -cat +* cat - Usage: <<>> + Usage: <<>> Copies source paths to stdout. Example: - * <<>> + * <<>> - * <<>> + * <<>> Exit Code: Returns 0 on success and -1 on error. -chgrp +* checksum + + Usage: <<>> + + Returns the checksum information of a file. + + Example: + + * <<>> - Usage: <<>> + * <<>> + +* chgrp + + Usage: <<>> Change group association of files. The user must be the owner of files, or else a super-user. Additional information is in the @@ -94,9 +110,9 @@ chgrp * The -R option will make the change recursively through the directory structure. -chmod +* chmod - Usage: << URI [URI ...]>>> + Usage: << URI [URI ...]>>> Change the permissions of files. With -R, make the change recursively through the directory structure. The user must be the owner of the file, or @@ -107,9 +123,9 @@ chmod * The -R option will make the change recursively through the directory structure. -chown +* chown - Usage: <<>> + Usage: <<>> Change the owner of files. The user must be a super-user. Additional information is in the {{{../hadoop-hdfs/HdfsPermissionsGuide.html}Permissions Guide}}. @@ -118,9 +134,9 @@ chown * The -R option will make the change recursively through the directory structure. -copyFromLocal +* copyFromLocal - Usage: << URI>>> + Usage: << URI>>> Similar to put command, except that the source is restricted to a local file reference. @@ -129,16 +145,16 @@ copyFromLocal * The -f option will overwrite the destination if it already exists. -copyToLocal +* copyToLocal - Usage: << >>> + Usage: << >>> Similar to get command, except that the destination is restricted to a local file reference. -count +* count - Usage: << >>> + Usage: << >>> Count the number of directories, files and bytes under the paths that match the specified file pattern. The output columns with -count are: DIR_COUNT, @@ -151,19 +167,19 @@ count Example: - * <<>> + * <<>> - * <<>> + * <<>> - * <<>> + * <<>> Exit Code: Returns 0 on success and -1 on error. -cp +* cp - Usage: << >>> + Usage: << >>> Copy files from source to destination. This command allows multiple sources as well in which case the destination must be a directory. @@ -177,7 +193,7 @@ cp Options: * The -f option will overwrite the destination if it already exists. - + * The -p option will preserve file attributes [topx] (timestamps, ownership, permission, ACL, XAttr). If -p is specified with no , then preserves timestamps, ownership, permission. If -pa is specified, @@ -187,17 +203,41 @@ cp Example: - * <<>> + * <<>> - * <<>> + * <<>> Exit Code: Returns 0 on success and -1 on error. -du +* createSnapshot + + See {{{../hadoop-hdfs/HdfsSnapshots.html}HDFS Snapshots Guide}}. + + +* deleteSnapshot + + See {{{../hadoop-hdfs/HdfsSnapshots.html}HDFS Snapshots Guide}}. + +* df + + Usage: <<>> + + Displays free space. + + Options: + + * The -h option will format file sizes in a "human-readable" fashion (e.g + 64.0m instead of 67108864) + + Example: - Usage: <<>> + * <<>> + +* du + + Usage: <<>> Displays sizes of files and directories contained in the given directory or the length of a file in case its just a file. @@ -212,29 +252,29 @@ du Example: - * hdfs dfs -du /user/hadoop/dir1 /user/hadoop/file1 hdfs://nn.example.com/user/hadoop/dir1 + * <<>> Exit Code: Returns 0 on success and -1 on error. -dus +* dus - Usage: << >>> + Usage: << >>> Displays a summary of file lengths. - <> This command is deprecated. Instead use <<>>. + <> This command is deprecated. Instead use <<>>. -expunge +* expunge - Usage: <<>> + Usage: <<>> Empty the Trash. Refer to the {{{../hadoop-hdfs/HdfsDesign.html} HDFS Architecture Guide}} for more information on the Trash feature. -find +* find - Usage: << ... ... >>> + Usage: << ... ... >>> Finds all files that match the specified expression and applies selected actions to them. If no is specified then defaults to the current @@ -269,15 +309,15 @@ find Example: - <<>> + <<>> Exit Code: Returns 0 on success and -1 on error. -get +* get - Usage: << >>> + Usage: << >>> Copy files to the local file system. Files that fail the CRC check may be copied with the -ignorecrc option. Files and CRCs may be copied using the @@ -285,17 +325,17 @@ get Example: - * <<>> + * <<>> - * <<>> + * <<>> Exit Code: Returns 0 on success and -1 on error. -getfacl +* getfacl - Usage: << >>> + Usage: << >>> Displays the Access Control Lists (ACLs) of files and directories. If a directory has a default ACL, then getfacl also displays the default ACL. @@ -308,17 +348,17 @@ getfacl Examples: - * <<>> + * <<>> - * <<>> + * <<>> Exit Code: Returns 0 on success and non-zero on error. -getfattr +* getfattr - Usage: << >>> + Usage: << >>> Displays the extended attribute names and values (if any) for a file or directory. @@ -337,26 +377,32 @@ getfattr Examples: - * <<>> + * <<>> - * <<>> + * <<>> Exit Code: Returns 0 on success and non-zero on error. -getmerge +* getmerge - Usage: << [addnl]>>> + Usage: << [addnl]>>> Takes a source directory and a destination file as input and concatenates files in src into the destination local file. Optionally addnl can be set to enable adding a newline character at the end of each file. -ls +* help + + Usage: <<>> - Usage: << >>> + Return usage output. + +* ls + + Usage: << >>> Options: @@ -377,23 +423,23 @@ permissions userid groupid modification_date modification_time dirname Example: - * <<>> + * <<>> Exit Code: Returns 0 on success and -1 on error. -lsr +* lsr - Usage: << >>> + Usage: << >>> Recursive version of ls. - <> This command is deprecated. Instead use <<>> + <> This command is deprecated. Instead use <<>> -mkdir +* mkdir - Usage: << >>> + Usage: << >>> Takes path uri's as argument and creates directories. @@ -403,30 +449,30 @@ mkdir Example: - * <<>> + * <<>> - * <<>> + * <<>> Exit Code: Returns 0 on success and -1 on error. -moveFromLocal +* moveFromLocal - Usage: << >>> + Usage: << >>> Similar to put command, except that the source localsrc is deleted after it's copied. -moveToLocal +* moveToLocal - Usage: << >>> + Usage: << >>> Displays a "Not implemented yet" message. -mv +* mv - Usage: << >>> + Usage: << >>> Moves files from source to destination. This command allows multiple sources as well in which case the destination needs to be a directory. Moving files @@ -434,38 +480,42 @@ mv Example: - * <<>> + * <<>> - * <<>> + * <<>> Exit Code: Returns 0 on success and -1 on error. -put +* put - Usage: << ... >>> + Usage: << ... >>> Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system. - * <<>> + * <<>> - * <<>> + * <<>> - * <<>> + * <<>> - * <<>> + * <<>> Reads the input from stdin. Exit Code: Returns 0 on success and -1 on error. -rm +* renameSnapshot + + See {{{../hadoop-hdfs/HdfsSnapshots.html}HDFS Snapshots Guide}}. - Usage: <<>> +* rm + + Usage: <<>> Delete files specified as args. @@ -484,23 +534,37 @@ rm Example: - * <<>> + * <<>> Exit Code: Returns 0 on success and -1 on error. -rmr +* rmdir + + Usage: <<>> + + Delete a directory. + + Options: + + * --ignore-fail-on-non-empty: When using wildcards, do not fail if a directory still contains files. + + Example: + + * <<>> + +* rmr - Usage: <<>> + Usage: <<>> Recursive version of delete. - <> This command is deprecated. Instead use <<>> + <> This command is deprecated. Instead use <<>> -setfacl +* setfacl - Usage: <<} ]|[--set ] >>> + Usage: <<} ]|[--set ] >>> Sets Access Control Lists (ACLs) of files and directories. @@ -528,27 +592,27 @@ setfacl Examples: - * <<>> + * <<>> - * <<>> + * <<>> - * <<>> + * <<>> - * <<>> + * <<>> - * <<>> + * <<>> - * <<>> + * <<>> - * <<>> + * <<>> Exit Code: Returns 0 on success and non-zero on error. -setfattr +* setfattr - Usage: << >>> + Usage: << >>> Sets an extended attribute name and value for a file or directory. @@ -566,19 +630,19 @@ setfattr Examples: - * <<>> + * <<>> - * <<>> + * <<>> - * <<>> + * <<>> Exit Code: Returns 0 on success and non-zero on error. -setrep +* setrep - Usage: << >>> + Usage: << >>> Changes the replication factor of a file. If is a directory then the command recursively changes the replication factor of all files under @@ -593,28 +657,28 @@ setrep Example: - * <<>> + * <<>> Exit Code: Returns 0 on success and -1 on error. -stat +* stat - Usage: <<>> + Usage: <<>> Returns the stat information on the path. Example: - * <<>> + * <<>> Exit Code: Returns 0 on success and -1 on error. -tail +* tail - Usage: <<>> + Usage: <<>> Displays last kilobyte of the file to stdout. @@ -624,43 +688,54 @@ tail Example: - * <<>> + * <<>> Exit Code: Returns 0 on success and -1 on error. -test +* test - Usage: <<>> + Usage: <<>> Options: - * The -e option will check to see if the file exists, returning 0 if true. + * -d: f the path is a directory, return 0. + + * -e: if the path exists, return 0. - * The -z option will check to see if the file is zero length, returning 0 if true. + * -f: if the path is a file, return 0. - * The -d option will check to see if the path is directory, returning 0 if true. + * -s: if the path is not empty, return 0. + + * -z: if the file is zero length, return 0. Example: - * <<>> + * <<>> -text +* text - Usage: << >>> + Usage: << >>> Takes a source file and outputs the file in text format. The allowed formats are zip and TextRecordInputStream. -touchz +* touchz - Usage: <<>> + Usage: <<>> Create a file of zero length. Example: - * <<>> + * <<>> Exit Code: Returns 0 on success and -1 on error. + + +* usage + + Usage: <<>> + + Return the help for an individual command. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/hadoop/blob/94d342e6/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm index ef7532a..eb9c88a 100644 --- a/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm +++ b/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm @@ -11,12 +11,12 @@ ~~ limitations under the License. See accompanying LICENSE file. --- - Hadoop MapReduce Next Generation ${project.version} - Setting up a Single Node Cluster. + Hadoop ${project.version} - Setting up a Single Node Cluster. --- --- ${maven.build.timestamp} -Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. +Hadoop - Setting up a Single Node Cluster. %{toc|section=1|fromDepth=0} @@ -46,7 +46,9 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. HadoopJavaVersions}}. [[2]] ssh must be installed and sshd must be running to use the Hadoop - scripts that manage remote Hadoop daemons. + scripts that manage remote Hadoop daemons if the optional start + and stop scripts are to be used. Additionally, it is recommmended that + pdsh also be installed for better ssh resource management. ** Installing Software @@ -57,7 +59,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. ---- $ sudo apt-get install ssh - $ sudo apt-get install rsync + $ sudo apt-get install pdsh ---- * Download @@ -75,9 +77,6 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. ---- # set to the root of your Java installation export JAVA_HOME=/usr/java/latest - - # Assuming your installation directory is /usr/local/hadoop - export HADOOP_PREFIX=/usr/local/hadoop ---- Try the following command: @@ -158,6 +157,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. ---- $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys + $ chmod 0700 ~/.ssh/authorized_keys ---- ** Execution @@ -228,7 +228,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. $ sbin/stop-dfs.sh ---- -** YARN on Single Node +** YARN on a Single Node You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon @@ -239,7 +239,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. [[1]] Configure parameters as follows: - etc/hadoop/mapred-site.xml: + <<>>: +---+ @@ -250,7 +250,7 @@ Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. +---+ - etc/hadoop/yarn-site.xml: + <<>>: +---+