hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From a..@apache.org
Subject svn commit: r1440245 - in /hadoop/common/trunk/hadoop-common-project/hadoop-common: ./ src/main/docs/src/documentation/content/xdocs/ src/site/apt/
Date Wed, 30 Jan 2013 01:52:15 GMT
Author: atm
Date: Wed Jan 30 01:52:14 2013
New Revision: 1440245

URL: http://svn.apache.org/viewvc?rev=1440245&view=rev
Log:
HADOOP-9221. Convert remaining xdocs to APT. Contributed by Andy Isaacson.

Added:
    hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/NativeLibraries.apt.vm
    hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/ServiceLevelAuth.apt.vm
    hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/SingleNodeSetup.apt.vm
    hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/Superusers.apt.vm
Removed:
    hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/src/documentation/content/xdocs/Superusers.xml
    hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/src/documentation/content/xdocs/deployment_layout.xml
    hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/src/documentation/content/xdocs/native_libraries.xml
    hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/src/documentation/content/xdocs/service_level_auth.xml
    hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/src/documentation/content/xdocs/single_node_setup.xml
Modified:
    hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt

Modified: hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt?rev=1440245&r1=1440244&r2=1440245&view=diff
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt (original)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt Wed Jan 30 01:52:14
2013
@@ -336,6 +336,8 @@ Trunk (Unreleased)
 
     HADOOP-9190. packaging docs is broken. (Andy Isaacson via atm)
 
+    HADOOP-9221. Convert remaining xdocs to APT. (Andy Isaacson via atm)
+
 Release 2.0.3-alpha - Unreleased 
 
   INCOMPATIBLE CHANGES

Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/NativeLibraries.apt.vm
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/NativeLibraries.apt.vm?rev=1440245&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/NativeLibraries.apt.vm
(added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/NativeLibraries.apt.vm
Wed Jan 30 01:52:14 2013
@@ -0,0 +1,183 @@
+~~ Licensed under the Apache License, Version 2.0 (the "License");
+~~ you may not use this file except in compliance with the License.
+~~ You may obtain a copy of the License at
+~~
+~~   http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License. See accompanying LICENSE file.
+
+  ---
+  Native Libraries Guide
+  ---
+  ---
+  ${maven.build.timestamp}
+
+Native Libraries Guide
+
+%{toc|section=1|fromDepth=0}
+
+* Overview
+
+   This guide describes the native hadoop library and includes a small
+   discussion about native shared libraries.
+
+   Note: Depending on your environment, the term "native libraries" could
+   refer to all *.so's you need to compile; and, the term "native
+   compression" could refer to all *.so's you need to compile that are
+   specifically related to compression. Currently, however, this document
+   only addresses the native hadoop library (<<<libhadoop.so>>>).
+
+* Native Hadoop Library
+
+   Hadoop has native implementations of certain components for performance
+   reasons and for non-availability of Java implementations. These
+   components are available in a single, dynamically-linked native library
+   called the native hadoop library. On the *nix platforms the library is
+   named <<<libhadoop.so>>>.
+
+* Usage
+
+   It is fairly easy to use the native hadoop library:
+
+    [[1]] Review the components.
+
+    [[2]] Review the supported platforms.
+
+    [[3]] Either download a hadoop release, which will include a pre-built
+       version of the native hadoop library, or build your own version of
+       the native hadoop library. Whether you download or build, the name
+       for the library is the same: libhadoop.so
+
+    [[4]] Install the compression codec development packages (>zlib-1.2,
+       >gzip-1.2):
+          + If you download the library, install one or more development
+            packages - whichever compression codecs you want to use with
+            your deployment.
+          + If you build the library, it is mandatory to install both
+            development packages.
+
+    [[5]] Check the runtime log files.
+
+* Components
+
+   The native hadoop library includes two components, the zlib and gzip
+   compression codecs:
+
+     * zlib
+
+     * gzip
+
+   The native hadoop library is imperative for gzip to work.
+
+* Supported Platforms
+
+   The native hadoop library is supported on *nix platforms only. The
+   library does not to work with Cygwin or the Mac OS X platform.
+
+   The native hadoop library is mainly used on the GNU/Linus platform and
+   has been tested on these distributions:
+
+     * RHEL4/Fedora
+
+     * Ubuntu
+
+     * Gentoo
+
+   On all the above distributions a 32/64 bit native hadoop library will
+   work with a respective 32/64 bit jvm.
+
+* Download
+
+   The pre-built 32-bit i386-Linux native hadoop library is available as
+   part of the hadoop distribution and is located in the <<<lib/native>>>
+   directory. You can download the hadoop distribution from Hadoop Common
+   Releases.
+
+   Be sure to install the zlib and/or gzip development packages -
+   whichever compression codecs you want to use with your deployment.
+
+* Build
+
+   The native hadoop library is written in ANSI C and is built using the
+   GNU autotools-chain (autoconf, autoheader, automake, autoscan,
+   libtool). This means it should be straight-forward to build the library
+   on any platform with a standards-compliant C compiler and the GNU
+   autotools-chain (see the supported platforms).
+
+   The packages you need to install on the target platform are:
+
+     * C compiler (e.g. GNU C Compiler)
+
+     * GNU Autools Chain: autoconf, automake, libtool
+
+     * zlib-development package (stable version >= 1.2.0)
+
+   Once you installed the prerequisite packages use the standard hadoop
+   build.xml file and pass along the compile.native flag (set to true) to
+   build the native hadoop library:
+
+----
+   $ ant -Dcompile.native=true <target>
+----
+
+   You should see the newly-built library in:
+
+----
+   $ build/native/<platform>/lib
+----
+
+   where <platform> is a combination of the system-properties:
+   ${os.name}-${os.arch}-${sun.arch.data.model} (for example,
+   Linux-i386-32).
+
+   Please note the following:
+
+     * It is mandatory to install both the zlib and gzip development
+       packages on the target platform in order to build the native hadoop
+       library; however, for deployment it is sufficient to install just
+       one package if you wish to use only one codec.
+
+     * It is necessary to have the correct 32/64 libraries for zlib,
+       depending on the 32/64 bit jvm for the target platform, in order to
+       build and deploy the native hadoop library.
+
+* Runtime
+
+   The bin/hadoop script ensures that the native hadoop library is on the
+   library path via the system property:
+   <<<-Djava.library.path=<path> >>>
+
+   During runtime, check the hadoop log files for your MapReduce tasks.
+
+     * If everything is all right, then:
+       <<<DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop
library...>>>
+       <<<INFO util.NativeCodeLoader - Loaded the native-hadoop library>>>
+
+     * If something goes wrong, then:
+       <<<INFO util.NativeCodeLoader - Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable>>>
+
+* Native Shared Libraries
+
+   You can load any native shared library using DistributedCache for
+   distributing and symlinking the library files.
+
+   This example shows you how to distribute a shared library, mylib.so,
+   and load it from a MapReduce task.
+
+    [[1]] First copy the library to the HDFS:
+       <<<bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1>>>
+
+    [[2]] The job launching program should contain the following:
+       <<<DistributedCache.createSymlink(conf);>>>
+       <<<DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so. 1#mylib.so",
conf);>>>
+
+    [[3]] The MapReduce task can contain:
+       <<<System.loadLibrary("mylib.so");>>>
+
+   Note: If you downloaded or built the native hadoop library, you don’t
+   need to use DistibutedCache to make the library available to your
+   MapReduce tasks.

Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/ServiceLevelAuth.apt.vm
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/ServiceLevelAuth.apt.vm?rev=1440245&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/ServiceLevelAuth.apt.vm
(added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/ServiceLevelAuth.apt.vm
Wed Jan 30 01:52:14 2013
@@ -0,0 +1,164 @@
+~~ Licensed under the Apache License, Version 2.0 (the "License");
+~~ you may not use this file except in compliance with the License.
+~~ You may obtain a copy of the License at
+~~
+~~   http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License. See accompanying LICENSE file.
+
+  ---
+  Service Level Authorization Guide
+  ---
+  ---
+  ${maven.build.timestamp}
+
+Service Level Authorization Guide
+
+%{toc|section=1|fromDepth=0}
+
+* Purpose
+
+   This document describes how to configure and manage Service Level
+   Authorization for Hadoop.
+
+* Prerequisites
+
+   Make sure Hadoop is installed, configured and setup correctly. For more
+   information see:
+     * Single Node Setup for first-time users.
+     * Cluster Setup for large, distributed clusters.
+
+* Overview
+
+   Service Level Authorization is the initial authorization mechanism to
+   ensure clients connecting to a particular Hadoop service have the
+   necessary, pre-configured, permissions and are authorized to access the
+   given service. For example, a MapReduce cluster can use this mechanism
+   to allow a configured list of users/groups to submit jobs.
+
+   The <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> configuration file is
used to
+   define the access control lists for various Hadoop services.
+
+   Service Level Authorization is performed much before to other access
+   control checks such as file-permission checks, access control on job
+   queues etc.
+
+* Configuration
+
+   This section describes how to configure service-level authorization via
+   the configuration file <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>>.
+
+** Enable Service Level Authorization
+
+   By default, service-level authorization is disabled for Hadoop. To
+   enable it set the configuration property hadoop.security.authorization
+   to true in <<<${HADOOP_CONF_DIR}/core-site.xml>>>.
+
+** Hadoop Services and Configuration Properties
+
+   This section lists the various Hadoop services and their configuration
+   knobs:
+
+*-------------------------------------+--------------------------------------+
+|| Property                           || Service
+*-------------------------------------+--------------------------------------+
+security.client.protocol.acl          | ACL for ClientProtocol, which is used by user code
via the DistributedFileSystem.
+*-------------------------------------+--------------------------------------+
+security.client.datanode.protocol.acl | ACL for ClientDatanodeProtocol, the client-to-datanode
protocol for block recovery.
+*-------------------------------------+--------------------------------------+
+security.datanode.protocol.acl        | ACL for DatanodeProtocol, which is used by datanodes
to communicate with the namenode.
+*-------------------------------------+--------------------------------------+
+security.inter.datanode.protocol.acl  | ACL for InterDatanodeProtocol, the inter-datanode
protocol for updating generation timestamp.
+*-------------------------------------+--------------------------------------+
+security.namenode.protocol.acl        | ACL for NamenodeProtocol, the protocol used by the
secondary namenode to communicate with the namenode.
+*-------------------------------------+--------------------------------------+
+security.inter.tracker.protocol.acl   | ACL for InterTrackerProtocol, used by the tasktrackers
to communicate with the jobtracker.
+*-------------------------------------+--------------------------------------+
+security.job.submission.protocol.acl  | ACL for JobSubmissionProtocol, used by job clients
to communciate with the jobtracker for job submission, querying job status etc.
+*-------------------------------------+--------------------------------------+
+security.task.umbilical.protocol.acl  | ACL for TaskUmbilicalProtocol, used by the map and
reduce tasks to communicate with the parent tasktracker.
+*-------------------------------------+--------------------------------------+
+security.refresh.policy.protocol.acl  | ACL for RefreshAuthorizationPolicyProtocol, used
by the dfsadmin and mradmin commands to refresh the security policy in-effect.
+*-------------------------------------+--------------------------------------+
+security.ha.service.protocol.acl      | ACL for HAService protocol used by HAAdmin to manage
the active and stand-by states of namenode.
+*-------------------------------------+--------------------------------------+
+
+** Access Control Lists
+
+   <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> defines an access control
list for
+   each Hadoop service. Every access control list has a simple format:
+
+   The list of users and groups are both comma separated list of names.
+   The two lists are separated by a space.
+
+   Example: <<<user1,user2 group1,group2>>>.
+
+   Add a blank at the beginning of the line if only a list of groups is to
+   be provided, equivalently a comman-separated list of users followed by
+   a space or nothing implies only a set of given users.
+
+   A special value of <<<*>>> implies that all users are allowed to access
the
+   service.
+
+** Refreshing Service Level Authorization Configuration
+
+   The service-level authorization configuration for the NameNode and
+   JobTracker can be changed without restarting either of the Hadoop
+   master daemons. The cluster administrator can change
+   <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> on the master nodes and instruct
+   the NameNode and JobTracker to reload their respective configurations
+   via the <<<-refreshServiceAcl>>> switch to <<<dfsadmin>>>
and <<<mradmin>>> commands
+   respectively.
+
+   Refresh the service-level authorization configuration for the NameNode:
+
+----
+   $ bin/hadoop dfsadmin -refreshServiceAcl
+----
+
+   Refresh the service-level authorization configuration for the
+   JobTracker:
+
+----
+   $ bin/hadoop mradmin -refreshServiceAcl
+----
+
+   Of course, one can use the <<<security.refresh.policy.protocol.acl>>>
+   property in <<<${HADOOP_CONF_DIR}/hadoop-policy.xml>>> to restrict access
to
+   the ability to refresh the service-level authorization configuration to
+   certain users/groups.
+
+** Examples
+
+   Allow only users <<<alice>>>, <<<bob>>> and users
in the <<<mapreduce>>> group to submit
+   jobs to the MapReduce cluster:
+
+----
+<property>
+     <name>security.job.submission.protocol.acl</name>
+     <value>alice,bob mapreduce</value>
+</property>
+----
+
+   Allow only DataNodes running as the users who belong to the group
+   datanodes to communicate with the NameNode:
+
+----
+<property>
+     <name>security.datanode.protocol.acl</name>
+     <value>datanodes</value>
+</property>
+----
+
+   Allow any user to talk to the HDFS cluster as a DFSClient:
+
+----
+<property>
+     <name>security.client.protocol.acl</name>
+     <value>*</value>
+</property>
+----

Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/SingleNodeSetup.apt.vm
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/SingleNodeSetup.apt.vm?rev=1440245&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/SingleNodeSetup.apt.vm
(added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/SingleNodeSetup.apt.vm
Wed Jan 30 01:52:14 2013
@@ -0,0 +1,239 @@
+~~ Licensed under the Apache License, Version 2.0 (the "License");
+~~ you may not use this file except in compliance with the License.
+~~ You may obtain a copy of the License at
+~~
+~~   http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License. See accompanying LICENSE file.
+
+  ---
+  Single Node Setup
+  ---
+  ---
+  ${maven.build.timestamp}
+
+Single Node Setup
+
+%{toc|section=1|fromDepth=0}
+
+* Purpose
+
+   This document describes how to set up and configure a single-node
+   Hadoop installation so that you can quickly perform simple operations
+   using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).
+
+* Prerequisites
+
+** Supported Platforms
+
+     * GNU/Linux is supported as a development and production platform.
+       Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
+
+     * Win32 is supported as a development platform. Distributed operation
+       has not been well tested on Win32, so it is not supported as a
+       production platform.
+
+** Required Software
+
+   Required software for Linux and Windows include:
+
+    [[1]] Java^TM 1.6.x, preferably from Sun, must be installed.
+
+    [[2]] ssh must be installed and sshd must be running to use the Hadoop
+       scripts that manage remote Hadoop daemons.
+
+   Additional requirements for Windows include:
+
+    [[1]] Cygwin - Required for shell support in addition to the required
+       software above.
+
+** Installing Software
+
+   If your cluster doesn't have the requisite software you will need to
+   install it.
+
+   For example on Ubuntu Linux:
+
+----
+   $ sudo apt-get install ssh
+   $ sudo apt-get install rsync
+----
+
+   On Windows, if you did not install the required software when you
+   installed cygwin, start the cygwin installer and select the packages:
+
+     * openssh - the Net category
+
+* Download
+
+   To get a Hadoop distribution, download a recent stable release from one
+   of the Apache Download Mirrors.
+
+* Prepare to Start the Hadoop Cluster
+
+   Unpack the downloaded Hadoop distribution. In the distribution, edit
+   the file <<<conf/hadoop-env.sh>>> to define at least <<<JAVA_HOME>>>
to be the root
+   of your Java installation.
+
+   Try the following command:
+
+----
+   $ bin/hadoop
+----
+
+   This will display the usage documentation for the hadoop script.
+
+   Now you are ready to start your Hadoop cluster in one of the three
+   supported modes:
+
+     * Local (Standalone) Mode
+
+     * Pseudo-Distributed Mode
+
+     * Fully-Distributed Mode
+
+* Standalone Operation
+
+   By default, Hadoop is configured to run in a non-distributed mode, as a
+   single Java process. This is useful for debugging.
+
+   The following example copies the unpacked conf directory to use as
+   input and then finds and displays every match of the given regular
+   expression. Output is written to the given output directory.
+
+----
+   $ mkdir input
+   $ cp conf/*.xml input
+   $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
+   $ cat output/*
+---
+
+* Pseudo-Distributed Operation
+
+   Hadoop can also be run on a single-node in a pseudo-distributed mode
+   where each Hadoop daemon runs in a separate Java process.
+
+** Configuration
+
+   Use the following:
+
+   conf/core-site.xml:
+
+----
+<configuration>
+     <property>
+         <name>fs.defaultFS</name>
+         <value>hdfs://localhost:9000</value>
+     </property>
+</configuration>
+----
+
+   conf/hdfs-site.xml:
+
+----
+<configuration>
+     <property>
+         <name>dfs.replication</name>
+         <value>1</value>
+     </property>
+</configuration>
+----
+
+   conf/mapred-site.xml:
+
+----
+<configuration>
+     <property>
+         <name>mapred.job.tracker</name>
+         <value>localhost:9001</value>
+     </property>
+</configuration>
+----
+
+** Setup passphraseless ssh
+
+   Now check that you can ssh to the localhost without a passphrase:
+
+----
+   $ ssh localhost
+----
+
+   If you cannot ssh to localhost without a passphrase, execute the
+   following commands:
+
+----
+   $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
+   $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
+----
+
+** Execution
+
+   Format a new distributed-filesystem:
+
+----
+   $ bin/hadoop namenode -format
+----
+
+   Start the hadoop daemons:
+
+----
+   $ bin/start-all.sh
+----
+
+   The hadoop daemon log output is written to the <<<${HADOOP_LOG_DIR}>>>
+   directory (defaults to <<<${HADOOP_PREFIX}/logs>>>).
+
+   Browse the web interface for the NameNode and the JobTracker; by
+   default they are available at:
+
+     * NameNode - <<<http://localhost:50070/>>>
+
+     * JobTracker - <<<http://localhost:50030/>>>
+
+   Copy the input files into the distributed filesystem:
+
+----
+   $ bin/hadoop fs -put conf input
+----
+
+   Run some of the examples provided:
+
+----
+   $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
+----
+
+   Examine the output files:
+
+   Copy the output files from the distributed filesystem to the local
+   filesytem and examine them:
+
+----
+   $ bin/hadoop fs -get output output
+   $ cat output/*
+----
+
+   or
+
+   View the output files on the distributed filesystem:
+
+----
+   $ bin/hadoop fs -cat output/*
+----
+
+   When you're done, stop the daemons with:
+
+----
+   $ bin/stop-all.sh
+----
+
+* Fully-Distributed Operation
+
+   For information on setting up fully-distributed, non-trivial clusters
+   see {{{Cluster Setup}}}.
+
+   Java and JNI are trademarks or registered trademarks of Sun
+   Microsystems, Inc. in the United States and other countries.

Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/Superusers.apt.vm
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/Superusers.apt.vm?rev=1440245&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/Superusers.apt.vm
(added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/Superusers.apt.vm
Wed Jan 30 01:52:14 2013
@@ -0,0 +1,100 @@
+~~ Licensed under the Apache License, Version 2.0 (the "License");
+~~ you may not use this file except in compliance with the License.
+~~ You may obtain a copy of the License at
+~~
+~~   http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License. See accompanying LICENSE file.
+
+  ---
+  Superusers Acting On Behalf Of Other Users
+  ---
+  ---
+  ${maven.build.timestamp}
+
+Superusers Acting On Behalf Of Other Users
+
+%{toc|section=1|fromDepth=0}
+
+* Introduction
+
+   This document describes how a superuser can submit jobs or access hdfs
+   on behalf of another user in a secured way.
+
+* Use Case
+
+   The code example described in the next section is applicable for the
+   following use case.
+
+   A superuser with username 'super' wants to submit job and access hdfs
+   on behalf of a user joe. The superuser has kerberos credentials but
+   user joe doesn't have any. The tasks are required to run as user joe
+   and any file accesses on namenode are required to be done as user joe.
+   It is required that user joe can connect to the namenode or job tracker
+   on a connection authenticated with super's kerberos credentials. In
+   other words super is impersonating the user joe.
+
+* Code example
+
+   In this example super's kerberos credentials are used for login and a
+   proxy user ugi object is created for joe. The operations are performed
+   within the doAs method of this proxy user ugi object.
+
+----
+    ...
+    //Create ugi for joe. The login user is 'super'.
+    UserGroupInformation ugi =
+            UserGroupInformation.createProxyUser("joe", UserGroupInformation.getLoginUser());
+    ugi.doAs(new PrivilegedExceptionAction<Void>() {
+      public Void run() throws Exception {
+        //Submit a job
+        JobClient jc = new JobClient(conf);
+        jc.submitJob(conf);
+        //OR access hdfs
+        FileSystem fs = FileSystem.get(conf);
+        fs.mkdir(someFilePath);
+      }
+    }
+----
+
+* Configurations
+
+   The superuser must be configured on namenode and jobtracker to be
+   allowed to impersonate another user. Following configurations are
+   required.
+
+----
+   <property>
+     <name>hadoop.proxyuser.super.groups</name>
+     <value>group1,group2</value>
+     <description>Allow the superuser super to impersonate any members of the group
group1 and group2</description>
+   </property>
+   <property>
+     <name>hadoop.proxyuser.super.hosts</name>
+     <value>host1,host2</value>
+     <description>The superuser can connect only from host1 and host2 to impersonate
a user</description>
+   </property>
+----
+
+   If these configurations are not present, impersonation will not be
+   allowed and connection will fail.
+
+   If more lax security is preferred, the wildcard value * may be used to
+   allow impersonation from any host or of any user.
+
+* Caveats
+
+   The superuser must have kerberos credentials to be able to impersonate
+   another user. It cannot use delegation tokens for this feature. It
+   would be wrong if superuser adds its own delegation token to the proxy
+   user ugi, as it will allow the proxy user to connect to the service
+   with the privileges of the superuser.
+
+   However, if the superuser does want to give a delegation token to joe,
+   it must first impersonate joe and get a delegation token for joe, in
+   the same way as the code example above, and add it to the ugi of joe.
+   In this way the delegation token will have the owner as joe.



Mime
View raw message