Return-Path: X-Original-To: apmail-hadoop-common-commits-archive@www.apache.org Delivered-To: apmail-hadoop-common-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7701317D59 for ; Wed, 25 Feb 2015 00:03:51 +0000 (UTC) Received: (qmail 88349 invoked by uid 500); 25 Feb 2015 00:03:49 -0000 Delivered-To: apmail-hadoop-common-commits-archive@hadoop.apache.org Received: (qmail 88164 invoked by uid 500); 25 Feb 2015 00:03:49 -0000 Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-commits@hadoop.apache.org Received: (qmail 87749 invoked by uid 99); 25 Feb 2015 00:03:49 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Feb 2015 00:03:49 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id D4F8BE0D74; Wed, 25 Feb 2015 00:03:48 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: cmccabe@apache.org To: common-commits@hadoop.apache.org Date: Wed, 25 Feb 2015 00:03:51 -0000 Message-Id: In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [4/6] hadoop git commit: HADOOP-11495. Backport "convert site documentation from apt to markdown" to branch-2 (Masatake Iwasaki via Colin P. McCabe) http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/apt/SecureMode.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/src/site/apt/SecureMode.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/SecureMode.apt.vm deleted file mode 100644 index 0235219..0000000 --- a/hadoop-common-project/hadoop-common/src/site/apt/SecureMode.apt.vm +++ /dev/null @@ -1,689 +0,0 @@ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. See accompanying LICENSE file. - - --- - Hadoop in Secure Mode - --- - --- - ${maven.build.timestamp} - -%{toc|section=0|fromDepth=0|toDepth=3} - -Hadoop in Secure Mode - -* Introduction - - This document describes how to configure authentication for Hadoop in - secure mode. - - By default Hadoop runs in non-secure mode in which no actual - authentication is required. - By configuring Hadoop runs in secure mode, - each user and service needs to be authenticated by Kerberos - in order to use Hadoop services. - - Security features of Hadoop consist of - {{{Authentication}authentication}}, - {{{./ServiceLevelAuth.html}service level authorization}}, - {{{./HttpAuthentication.html}authentication for Web consoles}} - and {{{Data confidentiality}data confidenciality}}. - - -* Authentication - -** End User Accounts - - When service level authentication is turned on, - end users using Hadoop in secure mode needs to be authenticated by Kerberos. - The simplest way to do authentication is using <<>> command of Kerberos. - -** User Accounts for Hadoop Daemons - - Ensure that HDFS and YARN daemons run as different Unix users, - e.g. <<>> and <<>>. - Also, ensure that the MapReduce JobHistory server runs as - different user such as <<>>. - - It's recommended to have them share a Unix group, for e.g. <<>>. - See also "{{Mapping from user to group}}" for group management. - -*---------------+----------------------------------------------------------------------+ -|| User:Group || Daemons | -*---------------+----------------------------------------------------------------------+ -| hdfs:hadoop | NameNode, Secondary NameNode, JournalNode, DataNode | -*---------------+----------------------------------------------------------------------+ -| yarn:hadoop | ResourceManager, NodeManager | -*---------------+----------------------------------------------------------------------+ -| mapred:hadoop | MapReduce JobHistory Server | -*---------------+----------------------------------------------------------------------+ - -** Kerberos principals for Hadoop Daemons and Users - - For running hadoop service daemons in Hadoop in secure mode, - Kerberos principals are required. - Each service reads auhenticate information saved in keytab file with appropriate permission. - - HTTP web-consoles should be served by principal different from RPC's one. - - Subsections below shows the examples of credentials for Hadoop services. - -*** HDFS - - The NameNode keytab file, on the NameNode host, should look like the - following: - ----- -$ klist -e -k -t /etc/security/keytab/nn.service.keytab -Keytab name: FILE:/etc/security/keytab/nn.service.keytab -KVNO Timestamp Principal - 4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) ----- - - The Secondary NameNode keytab file, on that host, should look like the - following: - ----- -$ klist -e -k -t /etc/security/keytab/sn.service.keytab -Keytab name: FILE:/etc/security/keytab/sn.service.keytab -KVNO Timestamp Principal - 4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) ----- - - The DataNode keytab file, on each host, should look like the following: - ----- -$ klist -e -k -t /etc/security/keytab/dn.service.keytab -Keytab name: FILE:/etc/security/keytab/dn.service.keytab -KVNO Timestamp Principal - 4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) ----- - -*** YARN - - The ResourceManager keytab file, on the ResourceManager host, should look - like the following: - ----- -$ klist -e -k -t /etc/security/keytab/rm.service.keytab -Keytab name: FILE:/etc/security/keytab/rm.service.keytab -KVNO Timestamp Principal - 4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) ----- - - The NodeManager keytab file, on each host, should look like the following: - ----- -$ klist -e -k -t /etc/security/keytab/nm.service.keytab -Keytab name: FILE:/etc/security/keytab/nm.service.keytab -KVNO Timestamp Principal - 4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) ----- - -*** MapReduce JobHistory Server - - The MapReduce JobHistory Server keytab file, on that host, should look - like the following: - ----- -$ klist -e -k -t /etc/security/keytab/jhs.service.keytab -Keytab name: FILE:/etc/security/keytab/jhs.service.keytab -KVNO Timestamp Principal - 4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) - 4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) ----- - -** Mapping from Kerberos principal to OS user account - - Hadoop maps Kerberos principal to OS user account using - the rule specified by <<>> - which works in the same way as the <<>> in - {{{http://web.mit.edu/Kerberos/krb5-latest/doc/admin/conf_files/krb5_conf.html}Kerberos configuration file (krb5.conf)}}. - In addition, Hadoop <<>> mapping supports the <> flag that - lowercases the returned name. - - By default, it picks the first component of principal name as a user name - if the realms matches to the <<>> (usually defined in /etc/krb5.conf). - For example, <<>> is mapped to <<>> - by default rule. - -** Mapping from user to group - - Though files on HDFS are associated to owner and group, - Hadoop does not have the definition of group by itself. - Mapping from user to group is done by OS or LDAP. - - You can change a way of mapping by - specifying the name of mapping provider as a value of - <<>> - See {{{../hadoop-hdfs/HdfsPermissionsGuide.html}HDFS Permissions Guide}} for details. - - Practically you need to manage SSO environment using Kerberos with LDAP - for Hadoop in secure mode. - -** Proxy user - - Some products such as Apache Oozie which access the services of Hadoop - on behalf of end users need to be able to impersonate end users. - See {{{./Superusers.html}the doc of proxy user}} for details. - -** Secure DataNode - - Because the data transfer protocol of DataNode - does not use the RPC framework of Hadoop, - DataNode must authenticate itself by - using privileged ports which are specified by - <<>> and <<>>. - This authentication is based on the assumption - that the attacker won't be able to get root privileges. - - When you execute <<>> command as root, - server process binds privileged port at first, - then drops privilege and runs as the user account specified by - <<>>. - This startup process uses jsvc installed to <<>>. - You must specify <<>> and <<>> - as environment variables on start up (in hadoop-env.sh). - - As of version 2.6.0, SASL can be used to authenticate the data transfer - protocol. In this configuration, it is no longer required for secured clusters - to start the DataNode as root using jsvc and bind to privileged ports. To - enable SASL on data transfer protocol, set <<>> - in hdfs-site.xml, set a non-privileged port for <<>>, set - <<>> to and make sure the - <<>> environment variable is not defined. Note that it - is not possible to use SASL on data transfer protocol if - <<>> is set to a privileged port. This is required for - backwards-compatibility reasons. - - In order to migrate an existing cluster that used root authentication to start - using SASL instead, first ensure that version 2.6.0 or later has been deployed - to all cluster nodes as well as any external applications that need to connect - to the cluster. Only versions 2.6.0 and later of the HDFS client can connect - to a DataNode that uses SASL for authentication of data transfer protocol, so - it is vital that all callers have the correct version before migrating. After - version 2.6.0 or later has been deployed everywhere, update configuration of - any external applications to enable SASL. If an HDFS client is enabled for - SASL, then it can connect successfully to a DataNode running with either root - authentication or SASL authentication. Changing configuration for all clients - guarantees that subsequent configuration changes on DataNodes will not disrupt - the applications. Finally, each individual DataNode can be migrated by - changing its configuration and restarting. It is acceptable to have a mix of - some DataNodes running with root authentication and some DataNodes running with - SASL authentication temporarily during this migration period, because an HDFS - client enabled for SASL can connect to both. - -* Data confidentiality - -** Data Encryption on RPC - - The data transfered between hadoop services and clients. - Setting <<>> to <<<"privacy">>> in the core-site.xml - activate data encryption. - -** Data Encryption on Block data transfer. - - You need to set <<>> to <<<"true">>> in the hdfs-site.xml - in order to activate data encryption for data transfer protocol of DataNode. - - Optionally, you may set <<>> to either - "3des" or "rc4" to choose the specific encryption algorithm. If unspecified, - then the configured JCE default on the system is used, which is usually 3DES. - - Setting <<>> to - <<>> activates AES encryption. By default, this is - unspecified, so AES is not used. When AES is used, the algorithm specified in - <<>> is still used during an initial key - exchange. The AES key bit length can be configured by setting - <<>> to 128, 192 or 256. The - default is 128. - - AES offers the greatest cryptographic strength and the best performance. At - this time, 3DES and RC4 have been used more often in Hadoop clusters. - -** Data Encryption on HTTP - - Data transfer between Web-console and clients are protected by using SSL(HTTPS). - - -* Configuration - -** Permissions for both HDFS and local fileSystem paths - - The following table lists various paths on HDFS and local filesystems (on - all nodes) and recommended permissions: - -*-------------------+-------------------+------------------+------------------+ -|| Filesystem || Path || User:Group || Permissions | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | hdfs:hadoop | drwx------ | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | hdfs:hadoop | drwx------ | -*-------------------+-------------------+------------------+------------------+ -| local | $HADOOP_LOG_DIR | hdfs:hadoop | drwxrwxr-x | -*-------------------+-------------------+------------------+------------------+ -| local | $YARN_LOG_DIR | yarn:hadoop | drwxrwxr-x | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | yarn:hadoop | drwxr-xr-x | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | yarn:hadoop | drwxr-xr-x | -*-------------------+-------------------+------------------+------------------+ -| local | container-executor | root:hadoop | --Sr-s--- | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | root:hadoop | r-------- | -*-------------------+-------------------+------------------+------------------+ -| hdfs | / | hdfs:hadoop | drwxr-xr-x | -*-------------------+-------------------+------------------+------------------+ -| hdfs | /tmp | hdfs:hadoop | drwxrwxrwxt | -*-------------------+-------------------+------------------+------------------+ -| hdfs | /user | hdfs:hadoop | drwxr-xr-x | -*-------------------+-------------------+------------------+------------------+ -| hdfs | <<>> | yarn:hadoop | drwxrwxrwxt | -*-------------------+-------------------+------------------+------------------+ -| hdfs | <<>> | mapred:hadoop | | -| | | | drwxrwxrwxt | -*-------------------+-------------------+------------------+------------------+ -| hdfs | <<>> | mapred:hadoop | | -| | | | drwxr-x--- | -*-------------------+-------------------+------------------+------------------+ - -** Common Configurations - - In order to turn on RPC authentication in hadoop, - set the value of <<>> property to - <<<"kerberos">>>, and set security related settings listed below appropriately. - - The following properties should be in the <<>> of all the - nodes in the cluster. - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | <<>> : No authentication. (default) \ -| | | <<>> : Enable authentication by Kerberos. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Enable {{{./ServiceLevelAuth.html}RPC service-level authorization}}. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | -| | | : authentication only (default) \ -| | | : integrity check in addition to authentication \ -| | | : data encryption in addition to integrity | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | <<>>\ -| | <<>>\ -| | <...>\ -| | DEFAULT | -| | | The value is string containing new line characters. -| | | See -| | | {{{http://web.mit.edu/Kerberos/krb5-latest/doc/admin/conf_files/krb5_conf.html}Kerberos documentation}} -| | | for format for . -*-------------------------+-------------------------+------------------------+ -| <<>><<<.hosts>>> | | | -| | | comma separated hosts from which access are allowd to impersonation. | -| | | <<<*>>> means wildcard. | -*-------------------------+-------------------------+------------------------+ -| <<>><<<.groups>>> | | | -| | | comma separated groups to which users impersonated by belongs. | -| | | <<<*>>> means wildcard. | -*-------------------------+-------------------------+------------------------+ -Configuration for <<>> - -** NameNode - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Enable HDFS block access tokens for secure operations. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | This value is deprecated. Use dfs.http.policy | -*-------------------------+-------------------------+------------------------+ -| <<>> | or or | | -| | | HTTPS_ONLY turns off http access. This option takes precedence over | -| | | the deprecated configuration dfs.https.enable and hadoop.ssl.enabled. | -| | | If using SASL to authenticate data transfer protocol instead of | -| | | running DataNode as root and using privileged ports, then this property | -| | | must be set to to guarantee authentication of HTTP servers. | -| | | (See <<>>.) | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -*-------------------------+-------------------------+------------------------+ -| <<>> | <50470> | | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Kerberos keytab file for the NameNode. | -*-------------------------+-------------------------+------------------------+ -| <<>> | nn/_HOST@REALM.TLD | | -| | | Kerberos principal name for the NameNode. | -*-------------------------+-------------------------+------------------------+ -| <<>> | HTTP/_HOST@REALM.TLD | | -| | | HTTP Kerberos principal name for the NameNode. | -*-------------------------+-------------------------+------------------------+ -Configuration for <<>> - -** Secondary NameNode - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -*-------------------------+-------------------------+------------------------+ -| <<>> | <50470> | | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | | -| | | Kerberos keytab file for the Secondary NameNode. | -*-------------------------+-------------------------+------------------------+ -| <<>> | sn/_HOST@REALM.TLD | | -| | | Kerberos principal name for the Secondary NameNode. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | HTTP/_HOST@REALM.TLD | | -| | | HTTP Kerberos principal name for the Secondary NameNode. | -*-------------------------+-------------------------+------------------------+ -Configuration for <<>> - -** DataNode - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | 700 | | -*-------------------------+-------------------------+------------------------+ -| <<>> | <0.0.0.0:1004> | | -| | | Secure DataNode must use privileged port | -| | | in order to assure that the server was started securely. | -| | | This means that the server must be started via jsvc. | -| | | Alternatively, this must be set to a non-privileged port if using SASL | -| | | to authenticate data transfer protocol. | -| | | (See <<>>.) | -*-------------------------+-------------------------+------------------------+ -| <<>> | <0.0.0.0:1006> | | -| | | Secure DataNode must use privileged port | -| | | in order to assure that the server was started securely. | -| | | This means that the server must be started via jsvc. | -*-------------------------+-------------------------+------------------------+ -| <<>> | <0.0.0.0:50470> | | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Kerberos keytab file for the DataNode. | -*-------------------------+-------------------------+------------------------+ -| <<>> | dn/_HOST@REALM.TLD | | -| | | Kerberos principal name for the DataNode. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | set to <<>> when using data encryption | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | optionally set to <<<3des>>> or <<>> when using data encryption to | -| | | control encryption algorithm | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | optionally set to <<>> to activate AES encryption | -| | | when using data encryption | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | optionally set to <<<128>>>, <<<192>>> or <<<256>>> to control key bit | -| | | length when using AES with data encryption | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | : authentication only \ -| | | : integrity check in addition to authentication \ -| | | : data encryption in addition to integrity | -| | | This property is unspecified by default. Setting this property enables | -| | | SASL for authentication of data transfer protocol. If this is enabled, | -| | | then <<>> must use a non-privileged port, | -| | | <<>> must be set to and the | -| | | <<>> environment variable must be undefined when | -| | | starting the DataNode process. | -*-------------------------+-------------------------+------------------------+ -Configuration for <<>> - - -** WebHDFS - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | http/_HOST@REALM.TLD | | -| | | Kerberos keytab file for the WebHDFS. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Kerberos principal name for WebHDFS. | -*-------------------------+-------------------------+------------------------+ -Configuration for <<>> - - -** ResourceManager - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | | -| | | Kerberos keytab file for the ResourceManager. | -*-------------------------+-------------------------+------------------------+ -| <<>> | rm/_HOST@REALM.TLD | | -| | | Kerberos principal name for the ResourceManager. | -*-------------------------+-------------------------+------------------------+ -Configuration for <<>> - -** NodeManager - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Kerberos keytab file for the NodeManager. | -*-------------------------+-------------------------+------------------------+ -| <<>> | nm/_HOST@REALM.TLD | | -| | | Kerberos principal name for the NodeManager. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | <<>> | -| | | Use LinuxContainerExecutor. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Unix group of the NodeManager. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | The path to the executable of Linux container executor. | -*-------------------------+-------------------------+------------------------+ -Configuration for <<>> - -** Configuration for WebAppProxy - - The <<>> provides a proxy between the web applications - exported by an application and an end user. If security is enabled - it will warn users before accessing a potentially unsafe web application. - Authentication and authorization using the proxy is handled just like - any other privileged web application. - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | <<>> host:port for proxy to AM web apps. | | -| | | if this is the same as <<>>| -| | | or it is not defined then the <<>> will run the proxy| -| | | otherwise a standalone proxy server will need to be launched.| -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | | -| | | Kerberos keytab file for the WebAppProxy. | -*-------------------------+-------------------------+------------------------+ -| <<>> | wap/_HOST@REALM.TLD | | -| | | Kerberos principal name for the WebAppProxy. | -*-------------------------+-------------------------+------------------------+ -Configuration for <<>> - -** LinuxContainerExecutor - - A <<>> used by YARN framework which define how any - launched and controlled. - - The following are the available in Hadoop YARN: - -*--------------------------------------+--------------------------------------+ -|| ContainerExecutor || Description | -*--------------------------------------+--------------------------------------+ -| <<>> | | -| | The default executor which YARN uses to manage container execution. | -| | The container process has the same Unix user as the NodeManager. | -*--------------------------------------+--------------------------------------+ -| <<>> | | -| | Supported only on GNU/Linux, this executor runs the containers as either the | -| | YARN user who submitted the application (when full security is enabled) or | -| | as a dedicated user (defaults to nobody) when full security is not enabled. | -| | When full security is enabled, this executor requires all user accounts to be | -| | created on the cluster nodes where the containers are launched. It uses | -| | a executable that is included in the Hadoop distribution. | -| | The NodeManager uses this executable to launch and kill containers. | -| | The setuid executable switches to the user who has submitted the | -| | application and launches or kills the containers. For maximum security, | -| | this executor sets up restricted permissions and user/group ownership of | -| | local files and directories used by the containers such as the shared | -| | objects, jars, intermediate files, log files etc. Particularly note that, | -| | because of this, except the application owner and NodeManager, no other | -| | user can access any of the local files/directories including those | -| | localized as part of the distributed cache. | -*--------------------------------------+--------------------------------------+ - - To build the LinuxContainerExecutor executable run: - ----- - $ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/ ----- - - The path passed in <<<-Dcontainer-executor.conf.dir>>> should be the - path on the cluster nodes where a configuration file for the setuid - executable should be located. The executable should be installed in - $HADOOP_YARN_HOME/bin. - - The executable must have specific permissions: 6050 or --Sr-s--- - permissions user-owned by (super-user) and group-owned by a - special group (e.g. <<>>) of which the NodeManager Unix user is - the group member and no ordinary application user is. If any application - user belongs to this special group, security will be compromised. This - special group name should be specified for the configuration property - <<>> in both - <<>> and <<>>. - - For example, let's say that the NodeManager is run as user who is - part of the groups users and , any of them being the primary group. - Let also be that has both and another user - (application submitter) as its members, and does not - belong to . Going by the above description, the setuid/setgid - executable should be set 6050 or --Sr-s--- with user-owner as and - group-owner as which has as its member (and not - which has also as its member besides ). - - The LinuxTaskController requires that paths including and leading up to - the directories specified in <<>> and - <<>> to be set 755 permissions as described - above in the table on permissions on directories. - - * <<>> - - The executable requires a configuration file called - <<>> to be present in the configuration - directory passed to the mvn target mentioned above. - - The configuration file must be owned by the user running NodeManager - (user <<>> in the above example), group-owned by anyone and - should have the permissions 0400 or r--------. - - The executable requires following configuration items to be present - in the <<>> file. The items should be - mentioned as simple key=value pairs, one per-line: - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | | Unix group of the NodeManager. The group owner of the | -| | | binary should be this group. Should be same as the | -| | | value with which the NodeManager is configured. This configuration is | -| | | required for validating the secure access of the | -| | | binary. | -*-------------------------+-------------------------+------------------------+ -| <<>> | hdfs,yarn,mapred,bin | Banned users. | -*-------------------------+-------------------------+------------------------+ -| <<>> | foo,bar | Allowed system users. | -*-------------------------+-------------------------+------------------------+ -| <<>> | 1000 | Prevent other super-users. | -*-------------------------+-------------------------+------------------------+ -Configuration for <<>> - - To re-cap, here are the local file-sysytem permissions required for the - various paths related to the <<>>: - -*-------------------+-------------------+------------------+------------------+ -|| Filesystem || Path || User:Group || Permissions | -*-------------------+-------------------+------------------+------------------+ -| local | container-executor | root:hadoop | --Sr-s--- | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | root:hadoop | r-------- | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | yarn:hadoop | drwxr-xr-x | -*-------------------+-------------------+------------------+------------------+ -| local | <<>> | yarn:hadoop | drwxr-xr-x | -*-------------------+-------------------+------------------+------------------+ - -** MapReduce JobHistory Server - -*-------------------------+-------------------------+------------------------+ -|| Parameter || Value || Notes | -*-------------------------+-------------------------+------------------------+ -| <<>> | | | -| | MapReduce JobHistory Server | Default port is 10020. | -*-------------------------+-------------------------+------------------------+ -| <<>> | | -| | | | -| | | Kerberos keytab file for the MapReduce JobHistory Server. | -*-------------------------+-------------------------+------------------------+ -| <<>> | jhs/_HOST@REALM.TLD | | -| | | Kerberos principal name for the MapReduce JobHistory Server. | -*-------------------------+-------------------------+------------------------+ -Configuration for <<>> http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/apt/ServiceLevelAuth.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/src/site/apt/ServiceLevelAuth.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/ServiceLevelAuth.apt.vm deleted file mode 100644 index 86fb3d6..0000000 --- a/hadoop-common-project/hadoop-common/src/site/apt/ServiceLevelAuth.apt.vm +++ /dev/null @@ -1,216 +0,0 @@ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. See accompanying LICENSE file. - - --- - Service Level Authorization Guide - --- - --- - ${maven.build.timestamp} - -Service Level Authorization Guide - -%{toc|section=1|fromDepth=0} - -* Purpose - - This document describes how to configure and manage Service Level - Authorization for Hadoop. - -* Prerequisites - - Make sure Hadoop is installed, configured and setup correctly. For more - information see: - - * {{{./SingleCluster.html}Single Node Setup}} for first-time users. - - * {{{./ClusterSetup.html}Cluster Setup}} for large, distributed clusters. - -* Overview - - Service Level Authorization is the initial authorization mechanism to - ensure clients connecting to a particular Hadoop service have the - necessary, pre-configured, permissions and are authorized to access the - given service. For example, a MapReduce cluster can use this mechanism - to allow a configured list of users/groups to submit jobs. - - The <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> configuration file is used to - define the access control lists for various Hadoop services. - - Service Level Authorization is performed much before to other access - control checks such as file-permission checks, access control on job - queues etc. - -* Configuration - - This section describes how to configure service-level authorization via - the configuration file <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>>. - -** Enable Service Level Authorization - - By default, service-level authorization is disabled for Hadoop. To - enable it set the configuration property hadoop.security.authorization - to true in <<<${HADOOP_CONF_DIR}/core-site.xml>>>. - -** Hadoop Services and Configuration Properties - - This section lists the various Hadoop services and their configuration - knobs: - -*-------------------------------------+--------------------------------------+ -|| Property || Service -*-------------------------------------+--------------------------------------+ -security.client.protocol.acl | ACL for ClientProtocol, which is used by user code via the DistributedFileSystem. -*-------------------------------------+--------------------------------------+ -security.client.datanode.protocol.acl | ACL for ClientDatanodeProtocol, the client-to-datanode protocol for block recovery. -*-------------------------------------+--------------------------------------+ -security.datanode.protocol.acl | ACL for DatanodeProtocol, which is used by datanodes to communicate with the namenode. -*-------------------------------------+--------------------------------------+ -security.inter.datanode.protocol.acl | ACL for InterDatanodeProtocol, the inter-datanode protocol for updating generation timestamp. -*-------------------------------------+--------------------------------------+ -security.namenode.protocol.acl | ACL for NamenodeProtocol, the protocol used by the secondary namenode to communicate with the namenode. -*-------------------------------------+--------------------------------------+ -security.inter.tracker.protocol.acl | ACL for InterTrackerProtocol, used by the tasktrackers to communicate with the jobtracker. -*-------------------------------------+--------------------------------------+ -security.job.submission.protocol.acl | ACL for JobSubmissionProtocol, used by job clients to communciate with the jobtracker for job submission, querying job status etc. -*-------------------------------------+--------------------------------------+ -security.task.umbilical.protocol.acl | ACL for TaskUmbilicalProtocol, used by the map and reduce tasks to communicate with the parent tasktracker. -*-------------------------------------+--------------------------------------+ -security.refresh.policy.protocol.acl | ACL for RefreshAuthorizationPolicyProtocol, used by the dfsadmin and mradmin commands to refresh the security policy in-effect. -*-------------------------------------+--------------------------------------+ -security.ha.service.protocol.acl | ACL for HAService protocol used by HAAdmin to manage the active and stand-by states of namenode. -*-------------------------------------+--------------------------------------+ - -** Access Control Lists - - <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> defines an access control list for - each Hadoop service. Every access control list has a simple format: - - The list of users and groups are both comma separated list of names. - The two lists are separated by a space. - - Example: <<>>. - - Add a blank at the beginning of the line if only a list of groups is to - be provided, equivalently a comma-separated list of users followed by - a space or nothing implies only a set of given users. - - A special value of <<<*>>> implies that all users are allowed to access the - service. - - If access control list is not defined for a service, the value of - <<>> is applied. If - <<>> is not defined, <<<*>>> is applied. - - ** Blocked Access Control Lists - - In some cases, it is required to specify blocked access control list for a service. This specifies - the list of users and groups who are not authorized to access the service. The format of - the blocked access control list is same as that of access control list. The blocked access - control list can be specified via <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>>. The property name - is derived by suffixing with ".blocked". - - Example: The property name of blocked access control list for <<> - will be <<>> - - For a service, it is possible to specify both an access control list and a blocked control - list. A user is authorized to access the service if the user is in the access control and not in - the blocked access control list. - - If blocked access control list is not defined for a service, the value of - <<>> is applied. If - <<>> is not defined, - empty blocked access control list is applied. - - -** Refreshing Service Level Authorization Configuration - - The service-level authorization configuration for the NameNode and - JobTracker can be changed without restarting either of the Hadoop - master daemons. The cluster administrator can change - <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> on the master nodes and instruct - the NameNode and JobTracker to reload their respective configurations - via the <<<-refreshServiceAcl>>> switch to <<>> and <<>> commands - respectively. - - Refresh the service-level authorization configuration for the NameNode: - ----- - $ bin/hadoop dfsadmin -refreshServiceAcl ----- - - Refresh the service-level authorization configuration for the - JobTracker: - ----- - $ bin/hadoop mradmin -refreshServiceAcl ----- - - Of course, one can use the <<>> - property in <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> to restrict access to - the ability to refresh the service-level authorization configuration to - certain users/groups. - - ** Access Control using list of ip addresses, host names and ip ranges - - Access to a service can be controlled based on the ip address of the client accessing - the service. It is possible to restrict access to a service from a set of machines by - specifying a list of ip addresses, host names and ip ranges. The property name for each service - is derived from the corresponding acl's property name. If the property name of acl is - security.client.protocol.acl, property name for the hosts list will be - security.client.protocol.hosts. - - If hosts list is not defined for a service, the value of - <<>> is applied. If - <<>> is not defined, <<<*>>> is applied. - - It is possible to specify a blocked list of hosts. Only those machines which are in the - hosts list, but not in the blocked hosts list will be granted access to the service. The property - name is derived by suffixing with ".blocked". - - Example: The property name of blocked hosts list for <<> - will be <<>> - - If blocked hosts list is not defined for a service, the value of - <<>> is applied. If - <<>> is not defined, - empty blocked hosts list is applied. - -** Examples - - Allow only users <<>>, <<>> and users in the <<>> group to submit - jobs to the MapReduce cluster: - ----- - - security.job.submission.protocol.acl - alice,bob mapreduce - ----- - - Allow only DataNodes running as the users who belong to the group - datanodes to communicate with the NameNode: - ----- - - security.datanode.protocol.acl - datanodes - ----- - - Allow any user to talk to the HDFS cluster as a DFSClient: - ----- - - security.client.protocol.acl - * - ----- http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm deleted file mode 100644 index ef7532a..0000000 --- a/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm +++ /dev/null @@ -1,286 +0,0 @@ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. See accompanying LICENSE file. - - --- - Hadoop MapReduce Next Generation ${project.version} - Setting up a Single Node Cluster. - --- - --- - ${maven.build.timestamp} - -Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. - -%{toc|section=1|fromDepth=0} - -* Purpose - - This document describes how to set up and configure a single-node Hadoop - installation so that you can quickly perform simple operations using Hadoop - MapReduce and the Hadoop Distributed File System (HDFS). - -* Prerequisites - -** Supported Platforms - - * GNU/Linux is supported as a development and production platform. - Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes. - - * Windows is also a supported platform but the followings steps - are for Linux only. To set up Hadoop on Windows, see - {{{http://wiki.apache.org/hadoop/Hadoop2OnWindows}wiki page}}. - -** Required Software - - Required software for Linux include: - - [[1]] Java\u2122 must be installed. Recommended Java versions are described - at {{{http://wiki.apache.org/hadoop/HadoopJavaVersions} - HadoopJavaVersions}}. - - [[2]] ssh must be installed and sshd must be running to use the Hadoop - scripts that manage remote Hadoop daemons. - -** Installing Software - - If your cluster doesn't have the requisite software you will need to install - it. - - For example on Ubuntu Linux: - ----- - $ sudo apt-get install ssh - $ sudo apt-get install rsync ----- - -* Download - - To get a Hadoop distribution, download a recent stable release from one of - the {{{http://www.apache.org/dyn/closer.cgi/hadoop/common/} - Apache Download Mirrors}}. - -* Prepare to Start the Hadoop Cluster - - Unpack the downloaded Hadoop distribution. In the distribution, edit - the file <<>> to define some parameters as - follows: - ----- - # set to the root of your Java installation - export JAVA_HOME=/usr/java/latest - - # Assuming your installation directory is /usr/local/hadoop - export HADOOP_PREFIX=/usr/local/hadoop ----- - - Try the following command: - ----- - $ bin/hadoop ----- - - This will display the usage documentation for the hadoop script. - - Now you are ready to start your Hadoop cluster in one of the three supported - modes: - - * {{{Standalone Operation}Local (Standalone) Mode}} - - * {{{Pseudo-Distributed Operation}Pseudo-Distributed Mode}} - - * {{{Fully-Distributed Operation}Fully-Distributed Mode}} - -* Standalone Operation - - By default, Hadoop is configured to run in a non-distributed mode, as a - single Java process. This is useful for debugging. - - The following example copies the unpacked conf directory to use as input - and then finds and displays every match of the given regular expression. - Output is written to the given output directory. - ----- - $ mkdir input - $ cp etc/hadoop/*.xml input - $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-${project.version}.jar grep input output 'dfs[a-z.]+' - $ cat output/* ----- - -* Pseudo-Distributed Operation - - Hadoop can also be run on a single-node in a pseudo-distributed mode where - each Hadoop daemon runs in a separate Java process. - -** Configuration - - Use the following: - - etc/hadoop/core-site.xml: - -+---+ - - - fs.defaultFS - hdfs://localhost:9000 - - -+---+ - - etc/hadoop/hdfs-site.xml: - -+---+ - - - dfs.replication - 1 - - -+---+ - -** Setup passphraseless ssh - - Now check that you can ssh to the localhost without a passphrase: - ----- - $ ssh localhost ----- - - If you cannot ssh to localhost without a passphrase, execute the - following commands: - ----- - $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa - $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys ----- - -** Execution - - The following instructions are to run a MapReduce job locally. - If you want to execute a job on YARN, see {{YARN on Single Node}}. - - [[1]] Format the filesystem: - ----- - $ bin/hdfs namenode -format ----- - - [[2]] Start NameNode daemon and DataNode daemon: - ----- - $ sbin/start-dfs.sh ----- - - The hadoop daemon log output is written to the <<<${HADOOP_LOG_DIR}>>> - directory (defaults to <<<${HADOOP_HOME}/logs>>>). - - [[3]] Browse the web interface for the NameNode; by default it is - available at: - - * NameNode - <<>> - - [[4]] Make the HDFS directories required to execute MapReduce jobs: - ----- - $ bin/hdfs dfs -mkdir /user - $ bin/hdfs dfs -mkdir /user/ ----- - - [[5]] Copy the input files into the distributed filesystem: - ----- - $ bin/hdfs dfs -put etc/hadoop input ----- - - [[6]] Run some of the examples provided: - ----- - $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-${project.version}.jar grep input output 'dfs[a-z.]+' ----- - - [[7]] Examine the output files: - - Copy the output files from the distributed filesystem to the local - filesystem and examine them: - ----- - $ bin/hdfs dfs -get output output - $ cat output/* ----- - - or - - View the output files on the distributed filesystem: - ----- - $ bin/hdfs dfs -cat output/* ----- - - [[8]] When you're done, stop the daemons with: - ----- - $ sbin/stop-dfs.sh ----- - -** YARN on Single Node - - You can run a MapReduce job on YARN in a pseudo-distributed mode by setting - a few parameters and running ResourceManager daemon and NodeManager daemon - in addition. - - The following instructions assume that 1. ~ 4. steps of - {{{Execution}the above instructions}} are already executed. - - [[1]] Configure parameters as follows: - - etc/hadoop/mapred-site.xml: - -+---+ - - - mapreduce.framework.name - yarn - - -+---+ - - etc/hadoop/yarn-site.xml: - -+---+ - - - yarn.nodemanager.aux-services - mapreduce_shuffle - - -+---+ - - [[2]] Start ResourceManager daemon and NodeManager daemon: - ----- - $ sbin/start-yarn.sh ----- - - [[3]] Browse the web interface for the ResourceManager; by default it is - available at: - - * ResourceManager - <<>> - - [[4]] Run a MapReduce job. - - [[5]] When you're done, stop the daemons with: - ----- - $ sbin/stop-yarn.sh ----- - -* Fully-Distributed Operation - - For information on setting up fully-distributed, non-trivial clusters - see {{{./ClusterSetup.html}Cluster Setup}}. http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/apt/SingleNodeSetup.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/src/site/apt/SingleNodeSetup.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/SingleNodeSetup.apt.vm deleted file mode 100644 index eb0c801..0000000 --- a/hadoop-common-project/hadoop-common/src/site/apt/SingleNodeSetup.apt.vm +++ /dev/null @@ -1,24 +0,0 @@ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. See accompanying LICENSE file. - - --- - Single Node Setup - --- - --- - ${maven.build.timestamp} - -Single Node Setup - - This page will be removed in the next major release. - - See {{{./SingleCluster.html}Single Cluster Setup}} to set up and configure a - single-node Hadoop installation. http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/apt/Superusers.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/src/site/apt/Superusers.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/Superusers.apt.vm deleted file mode 100644 index 78ed9a4..0000000 --- a/hadoop-common-project/hadoop-common/src/site/apt/Superusers.apt.vm +++ /dev/null @@ -1,144 +0,0 @@ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. See accompanying LICENSE file. - - --- - Proxy user - Superusers Acting On Behalf Of Other Users - --- - --- - ${maven.build.timestamp} - -Proxy user - Superusers Acting On Behalf Of Other Users - -%{toc|section=1|fromDepth=0} - -* Introduction - - This document describes how a superuser can submit jobs or access hdfs - on behalf of another user. - -* Use Case - - The code example described in the next section is applicable for the - following use case. - - A superuser with username 'super' wants to submit job and access hdfs - on behalf of a user joe. The superuser has kerberos credentials but - user joe doesn't have any. The tasks are required to run as user joe - and any file accesses on namenode are required to be done as user joe. - It is required that user joe can connect to the namenode or job tracker - on a connection authenticated with super's kerberos credentials. In - other words super is impersonating the user joe. - - Some products such as Apache Oozie need this. - - -* Code example - - In this example super's credentials are used for login and a - proxy user ugi object is created for joe. The operations are performed - within the doAs method of this proxy user ugi object. - ----- - ... - //Create ugi for joe. The login user is 'super'. - UserGroupInformation ugi = - UserGroupInformation.createProxyUser("joe", UserGroupInformation.getLoginUser()); - ugi.doAs(new PrivilegedExceptionAction() { - public Void run() throws Exception { - //Submit a job - JobClient jc = new JobClient(conf); - jc.submitJob(conf); - //OR access hdfs - FileSystem fs = FileSystem.get(conf); - fs.mkdir(someFilePath); - } - } ----- - -* Configurations - - You can configure proxy user using properties - <<>> along with either or both of - <<>> - and <<>>. - - By specifying as below in core-site.xml, - the superuser named <<>> can connect - only from <<>> and <<>> - to impersonate a user belonging to <<>> and <<>>. - ----- - - hadoop.proxyuser.super.hosts - host1,host2 - - - hadoop.proxyuser.super.groups - group1,group2 - - ----- - - If these configurations are not present, impersonation will not be - allowed and connection will fail. - - If more lax security is preferred, the wildcard value * may be used to - allow impersonation from any host or of any user. - For example, by specifying as below in core-site.xml, - user named <<>> accessing from any host - can impersonate any user belonging to any group. - ----- - - hadoop.proxyuser.oozie.hosts - * - - - hadoop.proxyuser.oozie.groups - * - ----- - - The <<>> accepts list of ip addresses, - ip address ranges in CIDR format and/or host names. - For example, by specifying as below, - user named <<>> accessing from hosts in the range - <<<10.222.0.0-15>>> and <<<10.113.221.221>>> can impersonate - <<>> and <<>>. - ----- - - hadoop.proxyuser.super.hosts - 10.222.0.0/16,10.113.221.221 - - - hadoop.proxyuser.super.users - user1,user2 - ----- - - -* Caveats - - If the cluster is running in {{{./SecureMode.html}Secure Mode}}, - the superuser must have kerberos credentials to be able to impersonate - another user. - - It cannot use delegation tokens for this feature. It - would be wrong if superuser adds its own delegation token to the proxy - user ugi, as it will allow the proxy user to connect to the service - with the privileges of the superuser. - - However, if the superuser does want to give a delegation token to joe, - it must first impersonate joe and get a delegation token for joe, in - the same way as the code example above, and add it to the ugi of joe. - In this way the delegation token will have the owner as joe. http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/apt/Tracing.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/src/site/apt/Tracing.apt.vm b/hadoop-common-project/hadoop-common/src/site/apt/Tracing.apt.vm deleted file mode 100644 index c51037b..0000000 --- a/hadoop-common-project/hadoop-common/src/site/apt/Tracing.apt.vm +++ /dev/null @@ -1,233 +0,0 @@ -~~ Licensed under the Apache License, Version 2.0 (the "License"); -~~ you may not use this file except in compliance with the License. -~~ You may obtain a copy of the License at -~~ -~~ http://www.apache.org/licenses/LICENSE-2.0 -~~ -~~ Unless required by applicable law or agreed to in writing, software -~~ distributed under the License is distributed on an "AS IS" BASIS, -~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -~~ See the License for the specific language governing permissions and -~~ limitations under the License. See accompanying LICENSE file. - - --- - Hadoop Distributed File System-${project.version} - Enabling Dapper-like Tracing - --- - --- - ${maven.build.timestamp} - -Enabling Dapper-like Tracing in Hadoop - -%{toc|section=1|fromDepth=0} - -* {Dapper-like Tracing in Hadoop} - -** HTrace - - {{{https://issues.apache.org/jira/browse/HDFS-5274}HDFS-5274}} - added support for tracing requests through HDFS, - using the open source tracing library, {{{https://git-wip-us.apache.org/repos/asf/incubator-htrace.git}Apache HTrace}}. - Setting up tracing is quite simple, however it requires some very minor changes to your client code. - -** Samplers - Configure the samplers in <<>> property: <<>>. - The value can be NeverSampler, AlwaysSampler or ProbabilitySampler. NeverSampler: HTrace is OFF - for all spans; AlwaysSampler: HTrace is ON for all spans; ProbabilitySampler: HTrace is ON for - some percentage% of top-level spans. - -+---- - - hadoop.htrace.sampler - NeverSampler - -+---- - -** SpanReceivers - - The tracing system works by collecting information in structs called 'Spans'. - It is up to you to choose how you want to receive this information - by implementing the SpanReceiver interface, which defines one method: - -+---- -public void receiveSpan(Span span); -+---- - - Configure what SpanReceivers you'd like to use - by putting a comma separated list of the fully-qualified class name of - classes implementing SpanReceiver - in <<>> property: <<>>. - -+---- - - hadoop.htrace.spanreceiver.classes - org.apache.htrace.impl.LocalFileSpanReceiver - - - hadoop.htrace.local-file-span-receiver.path - /var/log/hadoop/htrace.out - -+---- - - You can omit package name prefix if you use span receiver bundled with HTrace. - -+---- - - hadoop.htrace.spanreceiver.classes - LocalFileSpanReceiver - -+---- - - - -** Setting up ZipkinSpanReceiver - - Instead of implementing SpanReceiver by yourself, - you can use <<>> which uses - {{{https://github.com/twitter/zipkin}Zipkin}} - for collecting and displaying tracing data. - - In order to use <<>>, - you need to download and setup {{{https://github.com/twitter/zipkin}Zipkin}} first. - - you also need to add the jar of <<>> to the classpath of Hadoop on each node. - Here is example setup procedure. - -+---- - $ git clone https://github.com/cloudera/htrace - $ cd htrace/htrace-zipkin - $ mvn compile assembly:single - $ cp target/htrace-zipkin-*-jar-with-dependencies.jar $HADOOP_HOME/share/hadoop/common/lib/ -+---- - - The sample configuration for <<>> is shown below. - By adding these to <<>> of NameNode and DataNodes, - <<>> is initialized on the startup. - You also need this configuration on the client node in addition to the servers. - -+---- - - hadoop.htrace.spanreceiver.classes - ZipkinSpanReceiver - - - hadoop.htrace.zipkin.collector-hostname - 192.168.1.2 - - - hadoop.htrace.zipkin.collector-port - 9410 - -+---- - - -** Dynamic update of tracing configuration - - You can use <<>> command to see and update the tracing configuration of each servers. - You must specify IPC server address of namenode or datanode by <<<-host>>> option. - You need to run the command against all servers if you want to update the configuration of all servers. - - <<>> shows list of loaded span receivers associated with the id. - -+---- - $ hadoop trace -list -host 192.168.56.2:9000 - ID CLASS - 1 org.apache.htrace.impl.LocalFileSpanReceiver - - $ hadoop trace -list -host 192.168.56.2:50020 - ID CLASS - 1 org.apache.htrace.impl.LocalFileSpanReceiver -+---- - - <<>> removes span receiver from server. - <<<-remove>>> options takes id of span receiver as argument. - -+---- - $ hadoop trace -remove 1 -host 192.168.56.2:9000 - Removed trace span receiver 1 -+---- - - <<>> adds span receiver to server. - You need to specify the class name of span receiver as argument of <<<-class>>> option. - You can specify the configuration associated with span receiver by <<<-Ckey=value>>> options. - -+---- - $ hadoop trace -add -class LocalFileSpanReceiver -Chadoop.htrace.local-file-span-receiver.path=/tmp/htrace.out -host 192.168.56.2:9000 - Added trace span receiver 2 with configuration hadoop.htrace.local-file-span-receiver.path = /tmp/htrace.out - - $ hadoop trace -list -host 192.168.56.2:9000 - ID CLASS - 2 org.apache.htrace.impl.LocalFileSpanReceiver -+---- - - -** Starting tracing spans by HTrace API - - In order to trace, - you will need to wrap the traced logic with <> as shown below. - When there is running tracing spans, - the tracing information is propagated to servers along with RPC requests. - - In addition, you need to initialize <<>> once per process. - -+---- -import org.apache.hadoop.hdfs.HdfsConfiguration; -import org.apache.hadoop.tracing.SpanReceiverHost; -import org.apache.htrace.Sampler; -import org.apache.htrace.Trace; -import org.apache.htrace.TraceScope; - -... - - SpanReceiverHost.getInstance(new HdfsConfiguration()); - -... - - TraceScope ts = Trace.startSpan("Gets", Sampler.ALWAYS); - try { - ... // traced logic - } finally { - if (ts != null) ts.close(); - } -+---- - -** Sample code for tracing - - The <<>> shown below is the wrapper of FsShell - which start tracing span before invoking HDFS shell command. - -+---- -import org.apache.hadoop.conf.Configuration; -import org.apache.hadoop.fs.FsShell; -import org.apache.hadoop.tracing.SpanReceiverHost; -import org.apache.hadoop.util.ToolRunner; -import org.apache.htrace.Sampler; -import org.apache.htrace.Trace; -import org.apache.htrace.TraceScope; - -public class TracingFsShell { - public static void main(String argv[]) throws Exception { - Configuration conf = new Configuration(); - FsShell shell = new FsShell(); - conf.setQuietMode(false); - shell.setConf(conf); - SpanReceiverHost.getInstance(conf); - int res = 0; - TraceScope ts = null; - try { - ts = Trace.startSpan("FsShell", Sampler.ALWAYS); - res = ToolRunner.run(shell, argv); - } finally { - shell.close(); - if (ts != null) ts.close(); - } - System.exit(res); - } -} -+---- - - You can compile and execute this code as shown below. - -+---- -$ javac -cp `hadoop classpath` TracingFsShell.java -$ java -cp .:`hadoop classpath` TracingFsShell -ls / -+---- http://git-wip-us.apache.org/repos/asf/hadoop/blob/343cffb0/hadoop-common-project/hadoop-common/src/site/markdown/CLIMiniCluster.md.vm ---------------------------------------------------------------------- diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/CLIMiniCluster.md.vm b/hadoop-common-project/hadoop-common/src/site/markdown/CLIMiniCluster.md.vm new file mode 100644 index 0000000..25fecda --- /dev/null +++ b/hadoop-common-project/hadoop-common/src/site/markdown/CLIMiniCluster.md.vm @@ -0,0 +1,68 @@ + + +Hadoop: CLI MiniCluster. +======================== + +* [Hadoop: CLI MiniCluster.](#Hadoop:_CLI_MiniCluster.) + * [Purpose](#Purpose) + * [Hadoop Tarball](#Hadoop_Tarball) + * [Running the MiniCluster](#Running_the_MiniCluster) + +Purpose +------- + +Using the CLI MiniCluster, users can simply start and stop a single-node Hadoop cluster with a single command, and without the need to set any environment variables or manage configuration files. The CLI MiniCluster starts both a `YARN`/`MapReduce` & `HDFS` clusters. + +This is useful for cases where users want to quickly experiment with a real Hadoop cluster or test non-Java programs that rely on significant Hadoop functionality. + +Hadoop Tarball +-------------- + +You should be able to obtain the Hadoop tarball from the release. Also, you can directly create a tarball from the source: + + $ mvn clean install -DskipTests + $ mvn package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip + +**NOTE:** You will need [protoc 2.5.0](http://code.google.com/p/protobuf/) installed. + +The tarball should be available in `hadoop-dist/target/` directory. + +Running the MiniCluster +----------------------- + +From inside the root directory of the extracted tarball, you can start the CLI MiniCluster using the following command: + + $ bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${project.version}-tests.jar minicluster -rmport RM_PORT -jhsport JHS_PORT + +In the example command above, `RM_PORT` and `JHS_PORT` should be replaced by the user's choice of these port numbers. If not specified, random free ports will be used. + +There are a number of command line arguments that the users can use to control which services to start, and to pass other configuration properties. The available command line arguments: + + $ -D Options to pass into configuration object + $ -datanodes How many datanodes to start (default 1) + $ -format Format the DFS (default false) + $ -help Prints option help. + $ -jhsport JobHistoryServer port (default 0--we choose) + $ -namenode URL of the namenode (default is either the DFS + $ cluster or a temporary dir) + $ -nnport NameNode port (default 0--we choose) + $ -nodemanagers How many nodemanagers to start (default 1) + $ -nodfs Don't start a mini DFS cluster + $ -nomr Don't start a mini MR cluster + $ -rmport ResourceManager port (default 0--we choose) + $ -writeConfig Save configuration to this XML file. + $ -writeDetails Write basic information to this JSON file. + +To display this full list of available arguments, the user can pass the `-help` argument to the above command.