Return-Path: X-Original-To: apmail-hadoop-common-commits-archive@www.apache.org Delivered-To: apmail-hadoop-common-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9C55080D6 for ; Sun, 11 Sep 2011 22:54:30 +0000 (UTC) Received: (qmail 74866 invoked by uid 500); 11 Sep 2011 22:54:30 -0000 Delivered-To: apmail-hadoop-common-commits-archive@hadoop.apache.org Received: (qmail 74645 invoked by uid 500); 11 Sep 2011 22:54:29 -0000 Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-commits@hadoop.apache.org Received: (qmail 74587 invoked by uid 99); 11 Sep 2011 22:54:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Sep 2011 22:54:29 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Sep 2011 22:54:19 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 4E2E623888EA for ; Sun, 11 Sep 2011 22:53:56 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1169567 - in /hadoop/common/branches/branch-0.20-security: ./ src/packages/ src/packages/deb/init.d/ src/packages/rpm/init.d/ src/packages/templates/conf/ Date: Sun, 11 Sep 2011 22:53:54 -0000 To: common-commits@hadoop.apache.org From: ddas@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20110911225356.4E2E623888EA@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: ddas Date: Sun Sep 11 22:53:52 2011 New Revision: 1169567 URL: http://svn.apache.org/viewvc?rev=1169567&view=rev Log: HADOOP-7599. Script improvements to setup a secure Hadoop cluster. Contributed by Eric Yang. Added: hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/capacity-scheduler.xml hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/commons-logging.properties hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hadoop-policy.xml hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/mapred-queue-acls.xml hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/taskcontroller.cfg Modified: hadoop/common/branches/branch-0.20-security/CHANGES.txt hadoop/common/branches/branch-0.20-security/build.xml hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-datanode hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-jobtracker hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-namenode hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-tasktracker hadoop/common/branches/branch-0.20-security/src/packages/hadoop-create-user.sh hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-conf.sh hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-hdfs.sh hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-single-node.sh hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-datanode hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-jobtracker hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-namenode hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-tasktracker hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/core-site.xml hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hadoop-env.sh hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hdfs-site.xml hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/mapred-site.xml Modified: hadoop/common/branches/branch-0.20-security/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/CHANGES.txt?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/CHANGES.txt (original) +++ hadoop/common/branches/branch-0.20-security/CHANGES.txt Sun Sep 11 22:53:52 2011 @@ -199,6 +199,9 @@ Release 0.20.205.0 - unreleased MAPREDUCE-2915. Ensure LTC passes java.library.path. (Kihwal Lee via acmurthy) + HADOOP-7599. Script improvements to setup a secure Hadoop cluster + (Eric Yang via ddas) + Release 0.20.204.0 - 2011-8-25 NEW FEATURES Modified: hadoop/common/branches/branch-0.20-security/build.xml URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/build.xml?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/build.xml (original) +++ hadoop/common/branches/branch-0.20-security/build.xml Sun Sep 11 22:53:52 2011 @@ -1433,6 +1433,13 @@ + + + + + + + Modified: hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-datanode URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-datanode?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-datanode (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-datanode Sun Sep 11 22:53:52 2011 @@ -75,6 +75,7 @@ check_privsep_dir() { } export PATH="${PATH:+$PATH:}/usr/sbin:/usr/bin" +export HADOOP_PREFIX="/usr" case "$1" in start) Modified: hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-jobtracker URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-jobtracker?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-jobtracker (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-jobtracker Sun Sep 11 22:53:52 2011 @@ -67,6 +67,7 @@ check_privsep_dir() { } export PATH="${PATH:+$PATH:}/usr/sbin:/usr/bin" +export HADOOP_PREFIX="/usr" case "$1" in start) Modified: hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-namenode URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-namenode?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-namenode (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-namenode Sun Sep 11 22:53:52 2011 @@ -71,6 +71,7 @@ format() { } export PATH="${PATH:+$PATH:}/usr/sbin:/usr/bin" +export HADOOP_PREFIX="/usr" case "$1" in start) Modified: hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-tasktracker URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-tasktracker?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-tasktracker (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/deb/init.d/hadoop-tasktracker Sun Sep 11 22:53:52 2011 @@ -67,6 +67,7 @@ check_privsep_dir() { } export PATH="${PATH:+$PATH:}/usr/sbin:/usr/bin" +export HADOOP_PREFIX="/usr" case "$1" in start) Modified: hadoop/common/branches/branch-0.20-security/src/packages/hadoop-create-user.sh URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/hadoop-create-user.sh?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/hadoop-create-user.sh (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/hadoop-create-user.sh Sun Sep 11 22:53:52 2011 @@ -30,29 +30,86 @@ usage() { echo " usage: $0 Require parameter: - -u Create user on HDFS + --config /etc/hadoop Location of Hadoop configuration file + -u Create user on HDFS Optional parameters: - -h Display this message + -h Display this message + --kerberos-realm=KERBEROS.EXAMPLE.COM Set Kerberos realm + --super-user=hdfs Set super user id + --super-user-keytab=/etc/security/keytabs/hdfs.keytab Set super user keytab location " exit 1 } -if [ $# != 2 ] ; then +OPTS=$(getopt \ + -n $0 \ + -o '' \ + -l 'kerberos-realm:' \ + -l 'super-user:' \ + -l 'super-user-keytab:' \ + -o 'h' \ + -o 'u' \ + -- "$@") + +if [ $? != 0 ] ; then usage exit 1 fi -while getopts "hu:" OPTION -do - case $OPTION in - u) - SETUP_USER=$2; shift 2 - ;; - h) +create_user() { + if [ "${SETUP_USER}" = "" ]; then + break + fi + HADOOP_HDFS_USER=${HADOOP_HDFS_USER:-hdfs} + export HADOOP_PREFIX + export HADOOP_CONF_DIR + export JAVA_HOME + export SETUP_USER=${SETUP_USER} + export SETUP_PATH=/user/${SETUP_USER} + + if [ ! "${KERBEROS_REALM}" = "" ]; then + # locate kinit cmd + if [ -e /etc/lsb-release ]; then + KINIT_CMD="/usr/bin/kinit -kt ${HDFS_USER_KEYTAB} ${HADOOP_HDFS_USER}" + else + KINIT_CMD="/usr/kerberos/bin/kinit -kt ${HDFS_USER_KEYTAB} ${HADOOP_HDFS_USER}" + fi + su -c "${KINIT_CMD}" ${HADOOP_HDFS_USER} + fi + + su -c "${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} fs -mkdir ${SETUP_PATH}" ${HADOOP_HDFS_USER} + su -c "${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} fs -chown ${SETUP_USER}:${SETUP_USER} ${SETUP_PATH}" ${HADOOP_HDFS_USER} + su -c "${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} fs -chmod 711 ${SETUP_PATH}" ${HADOOP_HDFS_USER} + + if [ "$?" == "0" ]; then + echo "User directory has been setup: ${SETUP_PATH}" + fi +} + +eval set -- "${OPTS}" +while true; do + case "$1" in + -u) + shift + ;; + --kerberos-realm) + KERBEROS_REALM=$2; shift 2 + ;; + --super-user) + HADOOP_HDFS_USER=$2; shift 2 + ;; + --super-user-keytab) + HDFS_USER_KEYTAB=$2; shift 2 + ;; + -h) usage ;; --) - shift ; break + while shift; do + SETUP_USER=$1 + create_user + done + break ;; *) echo "Unknown option: $1" @@ -62,15 +119,3 @@ do esac done -export HADOOP_PREFIX -export HADOOP_CONF_DIR -export JAVA_HOME -export SETUP_USER=${SETUP_USER} -export SETUP_PATH=/user/${SETUP_USER} - -su -c '${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} fs -mkdir ${SETUP_PATH}' hdfs -su -c '${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} fs -chown ${SETUP_USER}:${SETUP_USER} ${SETUP_PATH}' hdfs - -if [ "$?" == "0" ]; then - echo "User directory has been setup: ${SETUP_PATH}" -fi Modified: hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-conf.sh URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-conf.sh?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-conf.sh (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-conf.sh Sun Sep 11 22:53:52 2011 @@ -18,12 +18,8 @@ bin=`dirname "$0"` bin=`cd "$bin"; pwd` -if [ "$HADOOP_HOME" != "" ]; then - echo "Warning: \$HADOOP_HOME is deprecated." - echo -fi - -. "$bin"/../libexec/hadoop-config.sh +this="${BASH_SOURCE-$0}" +export HADOOP_PREFIX=`dirname "$this"`/.. usage() { echo " @@ -34,20 +30,48 @@ usage: $0 --default Setup configuration as default --conf-dir=/etc/hadoop Set configuration directory --datanode-dir=/var/lib/hadoop/hdfs/datanode Set datanode directory + --group=hadoop Set Hadoop group name -h Display this message - --jobtracker-url=hostname:9001 Set jobtracker url + --hdfs-user=hdfs Set HDFS user + --jobtracker-host=hostname Set jobtracker host + --namenode-host=hostname Set namenode host + --secondarynamenode-host=hostname Set secondary namenode host + --kerberos-realm=KERBEROS.EXAMPLE.COM Set Kerberos realm + --kinit-location=/usr/kerberos/bin/kinit Set kinit location + --keytab-dir=/etc/security/keytabs Set keytab directory --log-dir=/var/log/hadoop Set log directory --pid-dir=/var/run/hadoop Set pid directory - --hdfs-dir=/var/lib/hadoop/hdfs Set hdfs directory + --hdfs-dir=/var/lib/hadoop/hdfs Set HDFS directory + --hdfs-user-keytab=/home/hdfs/hdfs.keytab Set HDFS user key tab --mapred-dir=/var/lib/hadoop/mapred Set mapreduce directory + --mapreduce-user=mr Set mapreduce user + --mapreduce-user-keytab=/home/mr/hdfs.keytab Set mapreduce user key tab --namenode-dir=/var/lib/hadoop/hdfs/namenode Set namenode directory - --namenode-url=hdfs://hostname:9000/ Set namenode url --replication=3 Set replication factor --taskscheduler=org.apache.hadoop.mapred.JobQueueTaskScheduler Set task scheduler + --datanodes=hostname1,hostname2,... SET the datanodes + --tasktrackers=hostname1,hostname2,... SET the tasktrackers " exit 1 } +check_permission() { + TARGET=$1 + OWNER="0" + RESULT=0 + while [ "$TARGET" != "/" ]; do + PARENT=`dirname $TARGET` + NAME=`basename $TARGET` + OWNER=`ls -ln $PARENT | grep $NAME| awk '{print $3}'` + if [ "$OWNER" != "0" ]; then + RESULT=1 + break + fi + TARGET=`dirname $TARGET` + done + return $RESULT +} + template_generator() { REGEX='(\$\{[a-zA-Z_][a-zA-Z_0-9]*\})' cat $1 | @@ -65,18 +89,30 @@ OPTS=$(getopt \ -n $0 \ -o '' \ -l 'auto' \ + -l 'java-home:' \ -l 'conf-dir:' \ -l 'default' \ + -l 'group:' \ -l 'hdfs-dir:' \ -l 'namenode-dir:' \ -l 'datanode-dir:' \ -l 'mapred-dir:' \ - -l 'namenode-url:' \ - -l 'jobtracker-url:' \ + -l 'namenode-host:' \ + -l 'secondarynamenode-host:' \ + -l 'jobtracker-host:' \ -l 'log-dir:' \ -l 'pid-dir:' \ -l 'replication:' \ -l 'taskscheduler:' \ + -l 'hdfs-user:' \ + -l 'hdfs-user-keytab:' \ + -l 'mapreduce-user:' \ + -l 'mapreduce-user-keytab:' \ + -l 'keytab-dir:' \ + -l 'kerberos-realm:' \ + -l 'kinit-location:' \ + -l 'datanodes:' \ + -l 'tasktrackers:' \ -o 'h' \ -- "$@") @@ -95,6 +131,10 @@ while true ; do AUTOMATED=1 shift ;; + --java-home) + JAVA_HOME=$2; shift 2 + AUTOMATED=1 + ;; --conf-dir) HADOOP_CONF_DIR=$2; shift 2 AUTOMATED=1 @@ -102,6 +142,10 @@ while true ; do --default) AUTOMATED=1; shift ;; + --group) + HADOOP_GROUP=$2; shift 2 + AUTOMATED=1 + ;; -h) usage ;; @@ -121,11 +165,15 @@ while true ; do HADOOP_MAPRED_DIR=$2; shift 2 AUTOMATED=1 ;; - --namenode-url) + --namenode-host) HADOOP_NN_HOST=$2; shift 2 AUTOMATED=1 ;; - --jobtracker-url) + --secondarynamenode-host) + HADOOP_SNN_HOST=$2; shift 2 + AUTOMATED=1 + ;; + --jobtracker-host) HADOOP_JT_HOST=$2; shift 2 AUTOMATED=1 ;; @@ -145,6 +193,45 @@ while true ; do HADOOP_TASK_SCHEDULER=$2; shift 2 AUTOMATED=1 ;; + --hdfs-user) + HADOOP_HDFS_USER=$2; shift 2 + AUTOMATED=1 + ;; + --mapreduce-user) + HADOOP_MR_USER=$2; shift 2 + AUTOMATED=1 + ;; + --keytab-dir) + KEYTAB_DIR=$2; shift 2 + AUTOMATED=1 + ;; + --hdfs-user-keytab) + HDFS_KEYTAB=$2; shift 2 + AUTOMATED=1 + ;; + --mapreduce-user-keytab) + MR_KEYTAB=$2; shift 2 + AUTOMATED=1 + ;; + --kerberos-realm) + KERBEROS_REALM=$2; shift 2 + SECURITY_TYPE="kerberos" + AUTOMATED=1 + ;; + --kinit-location) + KINIT=$2; shift 2 + AUTOMATED=1 + ;; + --datanodes) + DATANODES=$2; shift 2 + AUTOMATED=1 + DATANODES=$(echo $DATANODES | tr ',' ' ') + ;; + --tasktrackers) + TASKTRACKERS=$2; shift 2 + AUTOMATED=1 + TASKTRACKERS=$(echo $TASKTRACKERS | tr ',' ' ') + ;; --) shift ; break ;; @@ -158,10 +245,11 @@ done AUTOSETUP=${AUTOSETUP:-1} JAVA_HOME=${JAVA_HOME:-/usr/java/default} -HADOOP_NN_HOST=${HADOOP_NN_HOST:-hdfs://`hostname`:9000/} +HADOOP_GROUP=${HADOOP_GROUP:-hadoop} +HADOOP_NN_HOST=${HADOOP_NN_HOST:-`hostname`} HADOOP_NN_DIR=${HADOOP_NN_DIR:-/var/lib/hadoop/hdfs/namenode} HADOOP_DN_DIR=${HADOOP_DN_DIR:-/var/lib/hadoop/hdfs/datanode} -HADOOP_JT_HOST=${HADOOP_JT_HOST:-`hostname`:9001} +HADOOP_JT_HOST=${HADOOP_JT_HOST:-`hostname`} HADOOP_HDFS_DIR=${HADOOP_HDFS_DIR:-/var/lib/hadoop/hdfs} HADOOP_MAPRED_DIR=${HADOOP_MAPRED_DIR:-/var/lib/hadoop/mapred} HADOOP_LOG_DIR=${HADOOP_LOG_DIR:-/var/log/hadoop} @@ -169,6 +257,25 @@ HADOOP_PID_DIR=${HADOOP_PID_DIR:-/var/lo HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop} HADOOP_REPLICATION=${HADOOP_RELICATION:-3} HADOOP_TASK_SCHEDULER=${HADOOP_TASK_SCHEDULER:-org.apache.hadoop.mapred.JobQueueTaskScheduler} +HADOOP_HDFS_USER=${HADOOP_HDFS_USER:-hdfs} +HADOOP_MR_USER=${HADOOP_MR_USER:-mr} +KEYTAB_DIR=${KEYTAB_DIR:-/etc/security/keytabs} +HDFS_KEYTAB=${HDFS_KEYTAB:-/home/hdfs/hdfs.keytab} +MR_KEYTAB=${MR_KEYTAB:-/home/mr/mr.keytab} +KERBEROS_REALM=${KERBEROS_REALM:-KERBEROS.EXAMPLE.COM} +SECURITY_TYPE=${SECURITY_TYPE:-simple} +KINIT=${KINIT:-/usr/kerberos/bin/kinit} +if [ "${SECURITY_TYPE}" = "kerberos" ]; then + TASK_CONTROLLER="org.apache.hadoop.mapred.LinuxTaskController" + HADOOP_DN_ADDR="0.0.0.0:1019" + HADOOP_DN_HTTP_ADDR="0.0.0.0:1022" + SECURITY="true" +else + TASK_CONTROLLER="org.apache.hadoop.mapred.DefaultTaskController" + HADDOP_DN_ADDR="0.0.0.0:50010" + HADOOP_DN_HTTP_ADDR="0.0.0.0:50075" + SECURITY="false" +fi if [ "${AUTOMATED}" != "1" ]; then echo "Setup Hadoop Configuration" @@ -179,13 +286,13 @@ if [ "${AUTOMATED}" != "1" ]; then read USER_HADOOP_LOG_DIR echo -n "Where would you like to put pid directory? (${HADOOP_PID_DIR}) " read USER_HADOOP_PID_DIR - echo -n "What is the url of the namenode? (${HADOOP_NN_HOST}) " + echo -n "What is the host of the namenode? (${HADOOP_NN_HOST}) " read USER_HADOOP_NN_HOST echo -n "Where would you like to put namenode data directory? (${HADOOP_NN_DIR}) " read USER_HADOOP_NN_DIR echo -n "Where would you like to put datanode data directory? (${HADOOP_DN_DIR}) " read USER_HADOOP_DN_DIR - echo -n "What is the url of the jobtracker? (${HADOOP_JT_HOST}) " + echo -n "What is the host of the jobtracker? (${HADOOP_JT_HOST}) " read USER_HADOOP_JT_HOST echo -n "Where would you like to put jobtracker/tasktracker data directory? (${HADOOP_MAPRED_DIR}) " read USER_HADOOP_MAPRED_DIR @@ -211,10 +318,10 @@ if [ "${AUTOMATED}" != "1" ]; then echo "Config directory : ${HADOOP_CONF_DIR}" echo "Log directory : ${HADOOP_LOG_DIR}" echo "PID directory : ${HADOOP_PID_DIR}" - echo "Namenode url : ${HADOOP_NN_HOST}" + echo "Namenode host : ${HADOOP_NN_HOST}" echo "Namenode directory : ${HADOOP_NN_DIR}" echo "Datanode directory : ${HADOOP_DN_DIR}" - echo "Jobtracker url : ${HADOOP_JT_HOST}" + echo "Jobtracker host : ${HADOOP_JT_HOST}" echo "Mapreduce directory : ${HADOOP_MAPRED_DIR}" echo "Task scheduler : ${HADOOP_TASK_SCHEDULER}" echo "JAVA_HOME directory : ${JAVA_HOME}" @@ -228,52 +335,179 @@ if [ "${AUTOMATED}" != "1" ]; then fi fi -rm -f core-site.xml >/dev/null -rm -f hdfs-site.xml >/dev/null -rm -f mapred-site.xml >/dev/null -rm -f hadoop-env.sh >/dev/null - -template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/core-site.xml core-site.xml -template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/hdfs-site.xml hdfs-site.xml -template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/mapred-site.xml mapred-site.xml -template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/hadoop-env.sh hadoop-env.sh - -chown root:hadoop hadoop-env.sh -chmod 755 hadoop-env.sh - if [ "${AUTOSETUP}" == "1" -o "${AUTOSETUP}" == "y" ]; then - mkdir -p ${HADOOP_HDFS_DIR} - mkdir -p ${HADOOP_NN_DIR} - mkdir -p ${HADOOP_DN_DIR} - mkdir -p ${HADOOP_MAPRED_DIR} + if [ -d ${KEYTAB_DIR} ]; then + chmod 700 ${KEYTAB_DIR}/* + chown ${HADOOP_MR_USER}:${HADOOP_GROUP} ${KEYTAB_DIR}/[jt]t.service.keytab + chown ${HADOOP_HDFS_USER}:${HADOOP_GROUP} ${KEYTAB_DIR}/[dns]n.service.keytab + fi + chmod 755 -R ${HADOOP_PREFIX}/sbin/*hadoop* + chmod 755 -R ${HADOOP_PREFIX}/bin/hadoop + chmod 755 -R ${HADOOP_PREFIX}/libexec/hadoop-config.sh + mkdir -p /home/${HADOOP_MR_USER} + chown ${HADOOP_MR_USER}:${HADOOP_GROUP} /home/${HADOOP_MR_USER} + HDFS_DIR=`echo ${HADOOP_HDFS_DIR} | sed -e 's/,/ /g'` + mkdir -p ${HDFS_DIR} + if [ -e ${HADOOP_NN_DIR} ]; then + rm -rf ${HADOOP_NN_DIR} + fi + DATANODE_DIR=`echo ${HADOOP_DN_DIR} | sed -e 's/,/ /g'` + mkdir -p ${DATANODE_DIR} + MAPRED_DIR=`echo ${HADOOP_MAPRED_DIR} | sed -e 's/,/ /g'` + mkdir -p ${MAPRED_DIR} mkdir -p ${HADOOP_CONF_DIR} + check_permission ${HADOOP_CONF_DIR} + if [ $? == 1 ]; then + echo "Full path to ${HADOOP_CONF_DIR} should be owned by root." + exit 1 + fi + mkdir -p ${HADOOP_LOG_DIR} - mkdir -p ${HADOOP_LOG_DIR}/hdfs - mkdir -p ${HADOOP_LOG_DIR}/mapred + #create the log sub dir for diff users + mkdir -p ${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER} + mkdir -p ${HADOOP_LOG_DIR}/${HADOOP_MR_USER} + mkdir -p ${HADOOP_PID_DIR} - chown hdfs:hadoop ${HADOOP_HDFS_DIR} - chown hdfs:hadoop ${HADOOP_NN_DIR} - chown hdfs:hadoop ${HADOOP_DN_DIR} - chown mapred:hadoop ${HADOOP_MAPRED_DIR} - chown root:hadoop ${HADOOP_LOG_DIR} + chown ${HADOOP_HDFS_USER}:${HADOOP_GROUP} ${HDFS_DIR} + chown ${HADOOP_HDFS_USER}:${HADOOP_GROUP} ${DATANODE_DIR} + chmod 700 -R ${DATANODE_DIR} + chown ${HADOOP_MR_USER}:${HADOOP_GROUP} ${MAPRED_DIR} + chown ${HADOOP_HDFS_USER}:${HADOOP_GROUP} ${HADOOP_LOG_DIR} chmod 775 ${HADOOP_LOG_DIR} chmod 775 ${HADOOP_PID_DIR} - chown hdfs:hadoop ${HADOOP_LOG_DIR}/hdfs - chown mapred:hadoop ${HADOOP_LOG_DIR}/mapred - cp -f *.xml ${HADOOP_CONF_DIR} - cp -f hadoop-env.sh ${HADOOP_CONF_DIR} + chown root:${HADOOP_GROUP} ${HADOOP_PID_DIR} + + #change the permission and the owner + chmod 755 ${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER} + chown ${HADOOP_HDFS_USER}:${HADOOP_GROUP} ${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER} + chmod 755 ${HADOOP_LOG_DIR}/${HADOOP_MR_USER} + chown ${HADOOP_MR_USER}:${HADOOP_GROUP} ${HADOOP_LOG_DIR}/${HADOOP_MR_USER} + + if [ -e ${HADOOP_CONF_DIR}/core-site.xml ]; then + mv -f ${HADOOP_CONF_DIR}/core-site.xml ${HADOOP_CONF_DIR}/core-site.xml.bak + fi + if [ -e ${HADOOP_CONF_DIR}/hdfs-site.xml ]; then + mv -f ${HADOOP_CONF_DIR}/hdfs-site.xml ${HADOOP_CONF_DIR}/hdfs-site.xml.bak + fi + if [ -e ${HADOOP_CONF_DIR}/mapred-site.xml ]; then + mv -f ${HADOOP_CONF_DIR}/mapred-site.xml ${HADOOP_CONF_DIR}/mapred-site.xml.bak + fi + if [ -e ${HADOOP_CONF_DIR}/hadoop-env.sh ]; then + mv -f ${HADOOP_CONF_DIR}/hadoop-env.sh ${HADOOP_CONF_DIR}/hadoop-env.sh.bak + fi + if [ -e ${HADOOP_CONF_DIR}/hadoop-policy.xml ]; then + mv -f ${HADOOP_CONF_DIR}/hadoop-policy.xml ${HADOOP_CONF_DIR}/hadoop-policy.xml.bak + fi + if [ -e ${HADOOP_CONF_DIR}/mapred-queue-acls.xml ]; then + mv -f ${HADOOP_CONF_DIR}/mapred-queue-acls.xml ${HADOOP_CONF_DIR}/mapred-queue-acls.xml.bak + fi + if [ -e ${HADOOP_CONF_DIR}/commons-logging.properties ]; then + mv -f ${HADOOP_CONF_DIR}/commons-logging.properties ${HADOOP_CONF_DIR}/commons-logging.properties.bak + fi + if [ -e ${HADOOP_CONF_DIR}/taskcontroller.cfg ]; then + mv -f ${HADOOP_CONF_DIR}/taskcontroller.cfg ${HADOOP_CONF_DIR}/taskcontroller.cfg.bak + fi + if [ -e ${HADOOP_CONF_DIR}/slaves ]; then + mv -f ${HADOOP_CONF_DIR}/slaves ${HADOOP_CONF_DIR}/slaves.bak + fi + if [ -e ${HADOOP_CONF_DIR}/dfs.include ]; then + mv -f ${HADOOP_CONF_DIR}/dfs.include ${HADOOP_CONF_DIR}/dfs.include.bak + fi + if [ -e ${HADOOP_CONF_DIR}/dfs.exclude ]; then + mv -f ${HADOOP_CONF_DIR}/dfs.exclude ${HADOOP_CONF_DIR}/dfs.exclude.bak + fi + if [ -e ${HADOOP_CONF_DIR}/mapred.include ]; then + mv -f ${HADOOP_CONF_DIR}/mapred.include ${HADOOP_CONF_DIR}/mapred.include.bak + fi + if [ -e ${HADOOP_CONF_DIR}/mapred.exclude ]; then + mv -f ${HADOOP_CONF_DIR}/mapred.exclude ${HADOOP_CONF_DIR}/mapred.exclude.bak + fi + + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/core-site.xml ${HADOOP_CONF_DIR}/core-site.xml + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/hdfs-site.xml ${HADOOP_CONF_DIR}/hdfs-site.xml + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/mapred-site.xml ${HADOOP_CONF_DIR}/mapred-site.xml + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/hadoop-env.sh ${HADOOP_CONF_DIR}/hadoop-env.sh + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/hadoop-policy.xml ${HADOOP_CONF_DIR}/hadoop-policy.xml + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/commons-logging.properties ${HADOOP_CONF_DIR}/commons-logging.properties + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/mapred-queue-acls.xml ${HADOOP_CONF_DIR}/mapred-queue-acls.xml + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/taskcontroller.cfg ${HADOOP_CONF_DIR}/taskcontroller.cfg + if [ ! -e ${HADOOP_CONF_DIR}/capacity-scheduler.xml ]; then + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/capacity-scheduler.xml ${HADOOP_CONF_DIR}/capacity-scheduler.xml + fi + + #set the owner of the hadoop dir to root + chown root ${HADOOP_PREFIX} + chown root:${HADOOP_GROUP} ${HADOOP_CONF_DIR}/hadoop-env.sh + chmod 755 ${HADOOP_CONF_DIR}/hadoop-env.sh + + #set taskcontroller + chown root:${HADOOP_GROUP} ${HADOOP_CONF_DIR}/taskcontroller.cfg + chmod 400 ${HADOOP_CONF_DIR}/taskcontroller.cfg + chown root:${HADOOP_GROUP} ${HADOOP_PREFIX}/bin/task-controller + chmod 6050 ${HADOOP_PREFIX}/bin/task-controller + + #generate the slaves file and include and exclude files for hdfs and mapred + echo '' > ${HADOOP_CONF_DIR}/slaves + echo '' > ${HADOOP_CONF_DIR}/dfs.include + echo '' > ${HADOOP_CONF_DIR}/dfs.exclude + echo '' > ${HADOOP_CONF_DIR}/mapred.include + echo '' > ${HADOOP_CONF_DIR}/mapred.exclude + for dn in $DATANODES + do + echo $dn >> ${HADOOP_CONF_DIR}/slaves + echo $dn >> ${HADOOP_CONF_DIR}/dfs.include + done + for tt in $TASKTRACKERS + do + echo $tt >> ${HADOOP_CONF_DIR}/mapred.include + done + echo "Configuration setup is completed." if [[ "$HADOOP_NN_HOST" =~ "`hostname`" ]]; then echo "Proceed to run hadoop-setup-hdfs.sh on namenode." fi else + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/core-site.xml ${HADOOP_CONF_DIR}/core-site.xml + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/hdfs-site.xml ${HADOOP_CONF_DIR}/hdfs-site.xml + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/mapred-site.xml ${HADOOP_CONF_DIR}/mapred-site.xml + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/hadoop-env.sh ${HADOOP_CONF_DIR}/hadoop-env.sh + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/hadoop-policy.xml ${HADOOP_CONF_DIR}/hadoop-policy.xml + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/commons-logging.properties ${HADOOP_CONF_DIR}/commons-logging.properties + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/mapred-queue-acls.xml ${HADOOP_CONF_DIR}/mapred-queue-acls.xml + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/taskcontroller.cfg ${HADOOP_CONF_DIR}/taskcontroller.cfg + template_generator ${HADOOP_PREFIX}/share/hadoop/templates/conf/hadoop-metrics2.properties ${HADOOP_CONF_DIR}/hadoop-metrics2.properties + + chown root:${HADOOP_GROUP} ${HADOOP_CONF_DIR}/hadoop-env.sh + chmod 755 ${HADOOP_CONF_DIR}/hadoop-env.sh + #set taskcontroller + chown root:${HADOOP_GROUP} ${HADOOP_CONF_DIR}/taskcontroller.cfg + chmod 400 ${HADOOP_CONF_DIR}/taskcontroller.cfg + chown root:${HADOOP_GROUP} ${HADOOP_PREFIX}/bin/task-controller + chmod 6050 ${HADOOP_PREFIX}/bin/task-controller + + #generate the slaves file and include and exclude files for hdfs and mapred + echo '' > ${HADOOP_CONF_DIR}/slaves + echo '' > ${HADOOP_CONF_DIR}/dfs.include + echo '' > ${HADOOP_CONF_DIR}/dfs.exclude + echo '' > ${HADOOP_CONF_DIR}/mapred.include + echo '' > ${HADOOP_CONF_DIR}/mapred.exclude + for dn in $DATANODES + do + echo $dn >> ${HADOOP_CONF_DIR}/slaves + echo $dn >> ${HADOOP_CONF_DIR}/dfs.include + done + for tt in $TASKTRACKERS + do + echo $tt >> ${HADOOP_CONF_DIR}/mapred.include + done + echo - echo "Configuration file has been generated, please copy:" + echo "Configuration file has been generated in:" echo - echo "core-site.xml" - echo "hdfs-site.xml" - echo "mapred-site.xml" - echo "hadoop-env.sh" + echo "${HADOOP_CONF_DIR}/core-site.xml" + echo "${HADOOP_CONF_DIR}/hdfs-site.xml" + echo "${HADOOP_CONF_DIR}/mapred-site.xml" + echo "${HADOOP_CONF_DIR}/hadoop-env.sh" echo echo " to ${HADOOP_CONF_DIR} on all nodes, and proceed to run hadoop-setup-hdfs.sh on namenode." fi Modified: hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-hdfs.sh URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-hdfs.sh?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-hdfs.sh (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-hdfs.sh Sun Sep 11 22:53:52 2011 @@ -18,30 +18,128 @@ bin=`dirname "$0"` bin=`cd "$bin"; pwd` -if [ "$HADOOP_HOME" != "" ]; then - echo "Warning: \$HADOOP_HOME is deprecated." - echo +. "$bin"/../libexec/hadoop-config.sh + +usage() { + echo " +usage: $0 + + Optional parameters: + --format Force namenode format + --group=hadoop Set Hadoop group + -h Display this message + --hdfs-user=hdfs Set HDFS user + --kerberos-realm=KERBEROS.EXAMPLE.COM Set Kerberos realm + --hdfs-user-keytab=/home/hdfs/hdfs.keytab Set HDFS user key tab + --mapreduce-user=mr Set mapreduce user + " + exit 1 +} + +OPTS=$(getopt \ + -n $0 \ + -o '' \ + -l 'format' \ + -l 'hdfs-user:' \ + -l 'hdfs-user-keytab:' \ + -l 'mapreduce-user:' \ + -l 'kerberos-realm:' \ + -o 'h' \ + -- "$@") + +if [ $? != 0 ] ; then + usage fi -. "$bin"/../libexec/hadoop-config.sh +eval set -- "${OPTS}" +while true ; do + case "$1" in + --format) + FORMAT_NAMENODE=1; shift + AUTOMATED=1 + ;; + --group) + HADOOP_GROUP=$2; shift 2 + AUTOMATED=1 + ;; + --hdfs-user) + HADOOP_HDFS_USER=$2; shift 2 + AUTOMATED=1 + ;; + --mapreduce-user) + HADOOP_MR_USER=$2; shift 2 + AUTOMATED=1 + ;; + --hdfs-user-keytab) + HDFS_KEYTAB=$2; shift 2 + AUTOMATED=1 + ;; + --kerberos-realm) + KERBEROS_REALM=$2; shift 2 + AUTOMATED=1 + ;; + --) + shift ; break + ;; + *) + echo "Unknown option: $1" + usage + exit 1 + ;; + esac +done + +HADOOP_GROUP=${HADOOP_GROUP:-hadoop} +HADOOP_HDFS_USER=${HADOOP_HDFS_USER:-hdfs} +HADOOP_MAPREDUCE_USER=${HADOOP_MR_USER:-mapred} + +if [ "${KERBEROS_REALM}" != "" ]; then + # Determine kerberos location base on Linux distro. + if [ -e /etc/lsb-release ]; then + KERBEROS_BIN=/usr/bin + else + KERBEROS_BIN=/usr/kerberos/bin + fi + kinit_cmd="${KERBEROS_BIN}/kinit -k -t ${HDFS_KEYTAB} ${HADOOP_HDFS_USER}" + su -c "${kinit_cmd}" ${HADOOP_HDFS_USER} +fi echo "Setup Hadoop Distributed File System" echo -echo "Formatting namenode" -echo -su -c '${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} namenode -format' hdfs -echo + +# Format namenode +if [ "${FORMAT_NAMENODE}" == "1" ]; then + echo "Formatting namenode" + echo + su -c "echo Y | ${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} namenode -format" ${HADOOP_HDFS_USER} + echo +fi + +# Start namenode process echo "Starting namenode process" echo -/etc/init.d/hadoop-namenode start +if [ -e ${HADOOP_PREFIX}/sbin/hadoop-daemon.sh ]; then + DAEMON_PATH=${HADOOP_PREFIX}/sbin +else + DAEMON_PATH=${HADOOP_PREFIX}/bin +fi +su -c "${DAEMON_PATH}/hadoop-daemon.sh --config ${HADOOP_CONF_DIR} start namenode" ${HADOOP_HDFS_USER} echo echo "Initialize HDFS file system: " echo -su -c '${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} dfs -mkdir /user/mapred' hdfs -su -c '${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} dfs -chown mapred:mapred /user/mapred' hdfs -su -c '${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} dfs -mkdir /tmp' hdfs -su -c '${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} dfs -chmod 777 /tmp' hdfs +#create the /user dir +su -c "${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} dfs -mkdir /user" ${HADOOP_HDFS_USER} +su -c "${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} dfs -chmod 755 /user" ${HADOOP_HDFS_USER} + +#create /tmp and give it 777 +su -c "${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} dfs -mkdir /tmp" ${HADOOP_HDFS_USER} +su -c "${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} dfs -chmod 777 /tmp" ${HADOOP_HDFS_USER} + +#create /mapred +su -c "${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} dfs -mkdir /mapred" ${HADOOP_HDFS_USER} +su -c "${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} dfs -chmod 755 /mapred" ${HADOOP_HDFS_USER} +su -c "${HADOOP_PREFIX}/bin/hadoop --config ${HADOOP_CONF_DIR} dfs -chown ${HADOOP_MAPREDUCE_USER}:${HADOOP_GROUP} /mapred" ${HADOOP_HDFS_USER} if [ $? -eq 0 ]; then echo "Completed." Modified: hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-single-node.sh URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-single-node.sh?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-single-node.sh (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/hadoop-setup-single-node.sh Sun Sep 11 22:53:52 2011 @@ -136,10 +136,10 @@ SET_REBOOT=${SET_REBOOT:-y} if [ "${SET_CONFIG}" == "y" ]; then JAVA_HOME=${JAVA_HOME:-/usr/java/default} - HADOOP_NN_HOST=${HADOOP_NN_HOST:-hdfs://localhost:9000/} + HADOOP_NN_HOST=${HADOOP_NN_HOST:-localhost} HADOOP_NN_DIR=${HADOOP_NN_DIR:-/var/lib/hadoop/hdfs/namenode} HADOOP_DN_DIR=${HADOOP_DN_DIR:-/var/lib/hadoop/hdfs/datanode} - HADOOP_JT_HOST=${HADOOP_JT_HOST:-localhost:9001} + HADOOP_JT_HOST=${HADOOP_JT_HOST:-localhost} HADOOP_HDFS_DIR=${HADOOP_MAPRED_DIR:-/var/lib/hadoop/hdfs} HADOOP_MAPRED_DIR=${HADOOP_MAPRED_DIR:-/var/lib/hadoop/mapred} HADOOP_PID_DIR=${HADOOP_PID_DIR:-/var/run/hadoop} @@ -147,15 +147,17 @@ if [ "${SET_CONFIG}" == "y" ]; then HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop} HADOOP_REPLICATION=${HADOOP_RELICATION:-1} ${HADOOP_PREFIX}/sbin/hadoop-setup-conf.sh --auto \ + --hdfs-user=hdfs \ + --mapreduce-user=mapred \ --conf-dir=${HADOOP_CONF_DIR} \ --datanode-dir=${HADOOP_DN_DIR} \ --hdfs-dir=${HADOOP_HDFS_DIR} \ - --jobtracker-url=${HADOOP_JT_HOST} \ + --jobtracker-host=${HADOOP_JT_HOST} \ --log-dir=${HADOOP_LOG_DIR} \ --pid-dir=${HADOOP_PID_DIR} \ --mapred-dir=${HADOOP_MAPRED_DIR} \ --namenode-dir=${HADOOP_NN_DIR} \ - --namenode-url=${HADOOP_NN_HOST} \ + --namenode-host=${HADOOP_NN_HOST} \ --replication=${HADOOP_REPLICATION} fi Modified: hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-datanode URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-datanode?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-datanode (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-datanode Sun Sep 11 22:53:52 2011 @@ -27,6 +27,7 @@ source /etc/default/hadoop-env.sh RETVAL=0 PIDFILE="${HADOOP_PID_DIR}/hadoop-hdfs-datanode.pid" desc="Hadoop datanode daemon" +HADOOP_PREFIX="/usr" start() { echo -n $"Starting $desc (hadoop-datanode): " Modified: hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-jobtracker URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-jobtracker?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-jobtracker (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-jobtracker Sun Sep 11 22:53:52 2011 @@ -27,6 +27,7 @@ source /etc/default/hadoop-env.sh RETVAL=0 PIDFILE="${HADOOP_PID_DIR}/hadoop-mapred-jobtracker.pid" desc="Hadoop jobtracker daemon" +export HADOOP_PREFIX="/usr" start() { echo -n $"Starting $desc (hadoop-jobtracker): " Modified: hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-namenode URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-namenode?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-namenode (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-namenode Sun Sep 11 22:53:52 2011 @@ -27,6 +27,7 @@ source /etc/default/hadoop-env.sh RETVAL=0 PIDFILE="${HADOOP_PID_DIR}/hadoop-hdfs-namenode.pid" desc="Hadoop namenode daemon" +export HADOOP_PREFIX="/usr" start() { echo -n $"Starting $desc (hadoop-namenode): " Modified: hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-tasktracker URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-tasktracker?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-tasktracker (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/rpm/init.d/hadoop-tasktracker Sun Sep 11 22:53:52 2011 @@ -27,6 +27,7 @@ source /etc/default/hadoop-env.sh RETVAL=0 PIDFILE="${HADOOP_PID_DIR}/hadoop-mapred-tasktracker.pid" desc="Hadoop tasktracker daemon" +export HADOOP_PREFIX="/usr" start() { echo -n $"Starting $desc (hadoop-tasktracker): " Added: hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/capacity-scheduler.xml URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/capacity-scheduler.xml?rev=1169567&view=auto ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/capacity-scheduler.xml (added) +++ hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/capacity-scheduler.xml Sun Sep 11 22:53:52 2011 @@ -0,0 +1,178 @@ + + + + + + + + + + + mapred.capacity-scheduler.maximum-system-jobs + 3000 + Maximum number of jobs in the system which can be initialized, + concurrently, by the CapacityScheduler. + + + + + mapred.capacity-scheduler.queue.default.capacity + 100 + Percentage of the number of slots in the cluster that are + to be available for jobs in this queue. + + + + + mapred.capacity-scheduler.queue.default.maximum-capacity + -1 + + maximum-capacity defines a limit beyond which a queue cannot use the capacity of the cluster. + This provides a means to limit how much excess capacity a queue can use. By default, there is no limit. + The maximum-capacity of a queue can only be greater than or equal to its minimum capacity. + Default value of -1 implies a queue can use complete capacity of the cluster. + + This property could be to curtail certain jobs which are long running in nature from occupying more than a + certain percentage of the cluster, which in the absence of pre-emption, could lead to capacity guarantees of + other queues being affected. + + One important thing to note is that maximum-capacity is a percentage , so based on the cluster's capacity + the max capacity would change. So if large no of nodes or racks get added to the cluster , max Capacity in + absolute terms would increase accordingly. + + + + + mapred.capacity-scheduler.queue.default.supports-priority + false + If true, priorities of jobs will be taken into + account in scheduling decisions. + + + + + mapred.capacity-scheduler.queue.default.minimum-user-limit-percent + 100 + Each queue enforces a limit on the percentage of resources + allocated to a user at any given time, if there is competition for them. + This user limit can vary between a minimum and maximum value. The former + depends on the number of users who have submitted jobs, and the latter is + set to this property value. For example, suppose the value of this + property is 25. If two users have submitted jobs to a queue, no single + user can use more than 50% of the queue resources. If a third user submits + a job, no single user can use more than 33% of the queue resources. With 4 + or more users, no user can use more than 25% of the queue's resources. A + value of 100 implies no user limits are imposed. + + + + + mapred.capacity-scheduler.queue.default.user-limit-factor + 1 + The multiple of the queue capacity which can be configured to + allow a single user to acquire more slots. + + + + + mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks + 200000 + The maximum number of tasks, across all jobs in the queue, + which can be initialized concurrently. Once the queue's jobs exceed this + limit they will be queued on disk. + + + + + mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks-per-user + 100000 + The maximum number of tasks per-user, across all the of the + user's jobs in the queue, which can be initialized concurrently. Once the + user's jobs exceed this limit they will be queued on disk. + + + + + mapred.capacity-scheduler.queue.default.init-accept-jobs-factor + 10 + The multipe of (maximum-system-jobs * queue-capacity) used to + determine the number of jobs which are accepted by the scheduler. + + + + + + + + mapred.capacity-scheduler.default-supports-priority + false + If true, priorities of jobs will be taken into + account in scheduling decisions by default in a job queue. + + + + + mapred.capacity-scheduler.default-minimum-user-limit-percent + 100 + The percentage of the resources limited to a particular user + for the job queue at any given point of time by default. + + + + + + mapred.capacity-scheduler.default-user-limit-factor + 1 + The default multiple of queue-capacity which is used to + determine the amount of slots a single user can consume concurrently. + + + + + mapred.capacity-scheduler.default-maximum-active-tasks-per-queue + 200000 + The default maximum number of tasks, across all jobs in the + queue, which can be initialized concurrently. Once the queue's jobs exceed + this limit they will be queued on disk. + + + + + mapred.capacity-scheduler.default-maximum-active-tasks-per-user + 100000 + The default maximum number of tasks per-user, across all the of + the user's jobs in the queue, which can be initialized concurrently. Once + the user's jobs exceed this limit they will be queued on disk. + + + + + mapred.capacity-scheduler.default-init-accept-jobs-factor + 10 + The default multipe of (maximum-system-jobs * queue-capacity) + used to determine the number of jobs which are accepted by the scheduler. + + + + + + mapred.capacity-scheduler.init-poll-interval + 5000 + The amount of time in miliseconds which is used to poll + the job queues for jobs to initialize. + + + + mapred.capacity-scheduler.init-worker-threads + 5 + Number of worker threads which would be used by + Initialization poller to initialize jobs in a set of queue. + If number mentioned in property is equal to number of job queues + then a single thread would initialize jobs in a queue. If lesser + then a thread would get a set of queues assigned. If the number + is greater then number of threads would be equal to number of + job queues. + + + + Added: hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/commons-logging.properties URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/commons-logging.properties?rev=1169567&view=auto ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/commons-logging.properties (added) +++ hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/commons-logging.properties Sun Sep 11 22:53:52 2011 @@ -0,0 +1,7 @@ +#Logging Implementation + +#Log4J +org.apache.commons.logging.Log=org.apache.commons.logging.impl.Log4JLogger + +#JDK Logger +#org.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger Modified: hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/core-site.xml URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/core-site.xml?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/core-site.xml (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/core-site.xml Sun Sep 11 22:53:52 2011 @@ -4,8 +4,75 @@ + + + local.realm + ${KERBEROS_REALM} + + + + fs.default.name - ${HADOOP_NN_HOST} + hdfs://${HADOOP_NN_HOST}:8020 + The name of the default file system. Either the + literal string "local" or a host:port for NDFS. + + true + + + + fs.trash.interval + 360 + Number of minutes between trash checkpoints. + If zero, the trash feature is disabled. + + + + + hadoop.security.auth_to_local + + RULE:[2:$1@$0]([jt]t@.*${KERBEROS_REALM})s/.*/${HADOOP_MR_USER}/ + RULE:[2:$1@$0]([nd]n@.*${KERBEROS_REALM})s/.*/${HADOOP_HDFS_USER}/ + RULE:[2:$1@$0](mapred@.*${KERBEROS_REALM})s/.*/${HADOOP_MR_USER}/ + RULE:[2:$1@$0](hdfs@.*${KERBEROS_REALM})s/.*/${HADOOP_HDFS_USER}/ + RULE:[2:$1@$0](mapredqa@.*${KERBEROS_REALM})s/.*/${HADOOP_MR_USER}/ + RULE:[2:$1@$0](hdfsqa@.*${KERBEROS_REALM})s/.*/${HADOOP_HDFS_USER}/ + DEFAULT + + + + + + hadoop.security.authentication + ${SECURITY_TYPE} + + Set the authentication for the cluster. Valid values are: simple or + kerberos. + + + + hadoop.security.authorization + ${SECURITY} + + Enable authorization for different protocols. + + + + + hadoop.security.groups.cache.secs + 14400 + + + + hadoop.kerberos.kinit.command + ${KINIT} + + + + hadoop.http.filter.initializers + org.apache.hadoop.http.lib.StaticUserWebFilter + + Modified: hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hadoop-env.sh URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hadoop-env.sh?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hadoop-env.sh (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hadoop-env.sh Sun Sep 11 22:53:52 2011 @@ -5,59 +5,50 @@ # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. -# The java implementation to use. Required. +# The java implementation to use. export JAVA_HOME=${JAVA_HOME} +export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"} -# Location where Hadoop is installed -export HADOOP_PREFIX=${HADOOP_PREFIX} - -# Extra Java CLASSPATH elements. Optional. -# export HADOOP_CLASSPATH= +# Extra Java CLASSPATH elements. Automatically insert capacity-scheduler. +for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do + if [ "$HADOOP_CLASSPATH" ]; then + export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f + else + export HADOOP_CLASSPATH=$f + fi +done # The maximum amount of heap to use, in MB. Default is 1000. -# export HADOOP_HEAPSIZE=2000 +#export HADOOP_HEAPSIZE= +#export HADOOP_NAMENODE_INIT_HEAPSIZE="" # Extra Java runtime options. Empty by default. -# export HADOOP_OPTS=-server +export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true ${HADOOP_OPTS}" # Command specific options appended to HADOOP_OPTS when specified -export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS" -export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS" -export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS" -export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS" -export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS" -# export HADOOP_TASKTRACKER_OPTS= -# The following applies to multiple commands (fs, dfs, fsck, distcp etc) -# export HADOOP_CLIENT_OPTS +export HADOOP_NAMENODE_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_NAMENODE_OPTS}" +HADOOP_JOBTRACKER_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dmapred.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}" +HADOOP_TASKTRACKER_OPTS="-Dsecurity.audit.logger=ERROR,console -Dmapred.audit.logger=ERROR,console ${HADOOP_TASKTRACKER_OPTS}" +HADOOP_DATANODE_OPTS="-Dsecurity.audit.logger=ERROR,DRFAS ${HADOOP_DATANODE_OPTS}" -# Extra ssh options. Empty by default. -# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR" +export HADOOP_SECONDARYNAMENODE_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_SECONDARYNAMENODE_OPTS}" -# File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default. -# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves +# The following applies to multiple commands (fs, dfs, fsck, distcp etc) +export HADOOP_CLIENT_OPTS="-Xmx128m ${HADOOP_CLIENT_OPTS}" +#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData ${HADOOP_JAVA_PLATFORM_OPTS}" -# host:path where hadoop code should be rsync'd from. Unset by default. -# export HADOOP_MASTER=master:/home/$USER/src/hadoop +# On secure datanodes, user to run the datanode as after dropping privileges +export HADOOP_SECURE_DN_USER=${HADOOP_HDFS_USER} + +# Where log files are stored. $HADOOP_HOME/logs by default. +export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER -# Seconds to sleep between slave commands. Unset by default. This -# can be useful in large clusters, where, e.g., slave rsyncs can -# otherwise arrive faster than the master can service them. -# export HADOOP_SLAVE_SLEEP=0.1 +# Where log files are stored in the secure data environment. +export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER} # The directory where pid files are stored. /tmp by default. -HADOOP_PID_DIR=${HADOOP_PID_DIR} -export HADOOP_PID_DIR=${HADOOP_PID_DIR:-$HADOOP_PREFIX/var/run} +export HADOOP_PID_DIR=${HADOOP_PID_DIR} +export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR} # A string representing this instance of hadoop. $USER by default. -export HADOOP_IDENT_STRING=`whoami` - -# The scheduling priority for daemon processes. See 'man nice'. -# export HADOOP_NICENESS=10 - -# Where log files are stored. $HADOOP_HOME/logs by default. -HADOOP_LOG_DIR=${HADOOP_LOG_DIR} -export HADOOP_LOG_DIR=${HADOOP_LOG_DIR:-$HADOOP_PREFIX/var/log} - -# Hadoop configuration directory -HADOOP_CONF_DIR=${HADOOP_CONF_DIR} -export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$HADOOP_PREFIX/etc/hadoop} +export HADOOP_IDENT_STRING=$USER Added: hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hadoop-policy.xml URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hadoop-policy.xml?rev=1169567&view=auto ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hadoop-policy.xml (added) +++ hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hadoop-policy.xml Sun Sep 11 22:53:52 2011 @@ -0,0 +1,118 @@ + + + + + + + + security.client.protocol.acl + * + ACL for ClientProtocol, which is used by user code + via the DistributedFileSystem. + The ACL is a comma-separated list of user and group names. The user and + group list is separated by a blank. For e.g. "alice,bob users,wheel". + A special value of "*" means all users are allowed. + + + + security.client.datanode.protocol.acl + * + ACL for ClientDatanodeProtocol, the client-to-datanode protocol + for block recovery. + The ACL is a comma-separated list of user and group names. The user and + group list is separated by a blank. For e.g. "alice,bob users,wheel". + A special value of "*" means all users are allowed. + + + + security.datanode.protocol.acl + * + ACL for DatanodeProtocol, which is used by datanodes to + communicate with the namenode. + The ACL is a comma-separated list of user and group names. The user and + group list is separated by a blank. For e.g. "alice,bob users,wheel". + A special value of "*" means all users are allowed. + + + + security.inter.datanode.protocol.acl + * + ACL for InterDatanodeProtocol, the inter-datanode protocol + for updating generation timestamp. + The ACL is a comma-separated list of user and group names. The user and + group list is separated by a blank. For e.g. "alice,bob users,wheel". + A special value of "*" means all users are allowed. + + + + security.namenode.protocol.acl + * + ACL for NamenodeProtocol, the protocol used by the secondary + namenode to communicate with the namenode. + The ACL is a comma-separated list of user and group names. The user and + group list is separated by a blank. For e.g. "alice,bob users,wheel". + A special value of "*" means all users are allowed. + + + + security.inter.tracker.protocol.acl + * + ACL for InterTrackerProtocol, used by the tasktrackers to + communicate with the jobtracker. + The ACL is a comma-separated list of user and group names. The user and + group list is separated by a blank. For e.g. "alice,bob users,wheel". + A special value of "*" means all users are allowed. + + + + security.job.submission.protocol.acl + * + ACL for JobSubmissionProtocol, used by job clients to + communciate with the jobtracker for job submission, querying job status etc. + The ACL is a comma-separated list of user and group names. The user and + group list is separated by a blank. For e.g. "alice,bob users,wheel". + A special value of "*" means all users are allowed. + + + + security.task.umbilical.protocol.acl + * + ACL for TaskUmbilicalProtocol, used by the map and reduce + tasks to communicate with the parent tasktracker. + The ACL is a comma-separated list of user and group names. The user and + group list is separated by a blank. For e.g. "alice,bob users,wheel". + A special value of "*" means all users are allowed. + + + + security.admin.operations.protocol.acl + ${HADOOP_HDFS_USER} + ACL for AdminOperationsProtocol. Used for admin commands. + The ACL is a comma-separated list of user and group names. The user and + group list is separated by a blank. For e.g. "alice,bob users,wheel". + A special value of "*" means all users are allowed. + + + + security.refresh.usertogroups.mappings.protocol.acl + ${HADOOP_HDFS_USER} + ACL for RefreshUserMappingsProtocol. Used to refresh + users mappings. The ACL is a comma-separated list of user and + group names. The user and group list is separated by a blank. For + e.g. "alice,bob users,wheel". A special value of "*" means all + users are allowed. + + + + security.refresh.policy.protocol.acl + ${HADOOP_HDFS_USER} + ACL for RefreshAuthorizationPolicyProtocol, used by the + dfsadmin and mradmin commands to refresh the security policy in-effect. + The ACL is a comma-separated list of user and group names. The user and + group list is separated by a blank. For e.g. "alice,bob users,wheel". + A special value of "*" means all users are allowed. + + + + + Modified: hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hdfs-site.xml URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hdfs-site.xml?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hdfs-site.xml (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/hdfs-site.xml Sun Sep 11 22:53:52 2011 @@ -1,23 +1,225 @@ - - - - dfs.replication - ${HADOOP_REPLICATION} - + + + dfs.name.dir ${HADOOP_NN_DIR} + Determines where on the local filesystem the DFS name node + should store the name table. If this is a comma-delimited list + of directories then the name table is replicated in all of the + directories, for redundancy. + true + dfs.data.dir ${HADOOP_DN_DIR} + Determines where on the local filesystem an DFS data node + should store its blocks. If this is a comma-delimited + list of directories, then data will be stored in all named + directories, typically on different devices. + Directories that do not exist are ignored. + + true + + + + dfs.safemode.threshold.pct + 1.0f + + Specifies the percentage of blocks that should satisfy + the minimal replication requirement defined by dfs.replication.min. + Values less than or equal to 0 mean not to start in safe mode. + Values greater than 1 will make safe mode permanent. + + + + + dfs.datanode.address + ${HADOOP_DN_ADDR} + + + + dfs.datanode.http.address + ${HADOOP_DN_HTTP_ADDR} + + + + dfs.http.address + ${HADOOP_NN_HOST}:50070 + The name of the default file system. Either the + literal string "local" or a host:port for NDFS. + + true + + + + + dfs.umaskmode + 077 + + The octal umask used when creating files and directories. + + + + + dfs.block.access.token.enable + ${SECURITY} + + Are access tokens are used as capabilities for accessing datanodes. + + + + + dfs.namenode.kerberos.principal + nn/_HOST@${local.realm} + + Kerberos principal name for the NameNode + + + + + dfs.secondary.namenode.kerberos.principal + nn/_HOST@${local.realm} + + Kerberos principal name for the secondary NameNode. + + + + + dfs.namenode.kerberos.https.principal + host/_HOST@${local.realm} + + The Kerberos principal for the host that the NameNode runs on. + + + + + dfs.secondary.namenode.kerberos.https.principal + host/_HOST@${local.realm} + + The Kerberos principal for the hostthat the secondary NameNode runs on. + + + + + dfs.secondary.https.port + 50490 + The https port where secondary-namenode binds + + + + + dfs.datanode.kerberos.principal + dn/_HOST@${local.realm} + + The Kerberos principal that the DataNode runs as. "_HOST" is replaced by + the real host name. + + + + + dfs.namenode.keytab.file + /etc/security/keytabs/nn.service.keytab + + Combined keytab file containing the namenode service and host principals. + + + + + dfs.secondary.namenode.keytab.file + /etc/security/keytabs/nn.service.keytab + + Combined keytab file containing the namenode service and host principals. + + + + + dfs.datanode.keytab.file + /etc/security/keytabs/dn.service.keytab + + The filename of the keytab file for the DataNode. + + + + + dfs.https.port + 50470 + The https port where namenode binds + + + + dfs.https.address + ${HADOOP_NN_HOST}:50470 + The https address where namenode binds + + + + dfs.datanode.data.dir.perm + 700 + The permissions that should be there on dfs.data.dir + directories. The datanode will not come up if the permissions are + different on existing dfs.data.dir directories. If the directories + don't exist, they will be created with this permission. + + + + + dfs.cluster.administrators + ${HADOOP_HDFS_USER} + ACL for who all can view the default servlets in the HDFS + + + + dfs.permissions.superusergroup + ${HADOOP_GROUP} + The name of the group of super-users. + + + + dfs.namenode.http-address + ${HADOOP_NN_HOST}:50070 + + The address and the base port where the dfs namenode web ui will listen on. + If the port is 0 then the server will start on a free port. + + + + + dfs.namenode.https-address + ${HADOOP_NN_HOST}:50470 + + + + dfs.secondary.http.address + ${HADOOP_SNN_HOST}:50090 + + The secondary namenode http server address and port. + If the port is 0 then the server will start on a free port. + + + + + dfs.hosts + ${HADOOP_CONF_DIR}/dfs.include + Names a file that contains a list of hosts that are + permitted to connect to the namenode. The full pathname of the file + must be specified. If the value is empty, all hosts are + permitted. + + - hadoop.tmp.dir - /tmp + dfs.hosts.exclude + ${HADOOP_CONF_DIR}/dfs.exclude + Names a file that contains a list of hosts that are + not permitted to connect to the namenode. The full pathname of the + file must be specified. If the value is empty, no hosts are + excluded. + Added: hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/mapred-queue-acls.xml URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/mapred-queue-acls.xml?rev=1169567&view=auto ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/mapred-queue-acls.xml (added) +++ hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/mapred-queue-acls.xml Sun Sep 11 22:53:52 2011 @@ -0,0 +1,12 @@ + + + + +mapred.queue.default.acl-submit-job +* + + +mapred.queue.default.acl-administer-jobs +* + + Modified: hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/mapred-site.xml URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/mapred-site.xml?rev=1169567&r1=1169566&r2=1169567&view=diff ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/mapred-site.xml (original) +++ hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/mapred-site.xml Sun Sep 11 22:53:52 2011 @@ -4,28 +4,265 @@ + - mapred.job.tracker - ${HADOOP_JT_HOST} + mapred.tasktracker.tasks.sleeptime-before-sigkill + 250 + Normally, this is the amount of time before killing + processes, and the recommended-default is 5.000 seconds - a value of + 5000 here. In this case, we are using it solely to blast tasks before + killing them, and killing them very quickly (1/4 second) to guarantee + that we do not leave VMs around for later jobs. + mapred.system.dir - /user/mapred/system + /mapred/mapredsystem + true + + + + mapred.job.tracker + ${HADOOP_JT_HOST}:9000 + true + + + + mapred.job.tracker.http.address + ${HADOOP_JT_HOST}:50030 + true mapred.local.dir ${HADOOP_MAPRED_DIR} + true + + + + mapreduce.cluster.administrators + ${HADOOP_MR_USER} + + + + mapred.map.tasks.speculative.execution + false + If true, then multiple instances of some map tasks + may be executed in parallel. + + + + mapred.reduce.tasks.speculative.execution + false + If true, then multiple instances of some reduce tasks + may be executed in parallel. + + + + mapred.output.compression.type + BLOCK + If the job outputs are to compressed as SequenceFiles, how + should they be compressed? Should be one of NONE, RECORD or BLOCK. + + + + + jetty.connector + org.mortbay.jetty.nio.SelectChannelConnector + + + + mapred.task.tracker.task-controller + ${TASK_CONTROLLER} + + + + mapred.child.root.logger + INFO,TLA + + + + stream.tmpdir + ${mapred.temp.dir} + + + + + mapred.child.java.opts + -server -Xmx640m -Djava.net.preferIPv4Stack=true + + + + mapred.child.ulimit + 8388608 + + + + mapred.job.tracker.persist.jobstatus.active + true + Indicates if persistency of job status information is + active or not. + + + + + mapred.job.tracker.persist.jobstatus.dir + file:///${HADOOP_LOG_DIR}/${HADOOP_MR_USER}/jobstatus + The directory where the job status information is persisted + in a file system to be available after it drops of the memory queue and + between jobtracker restarts. + + + + + mapred.job.tracker.history.completed.location + /mapred/history/done + + + + mapred.heartbeats.in.second + 200 + to enable HADOOP:5784 + + + + mapreduce.tasktracker.outofband.heartbeat + true + to enable MAPREDUCE:270 + + + + mapred.jobtracker.maxtasks.per.job + 200000 + true + The maximum number of tasks for a single job. + A value of -1 indicates that there is no maximum. + + + + + mapreduce.jobtracker.kerberos.principal + jt/_HOST@${local.realm} + + JT principal + + + + + mapreduce.tasktracker.kerberos.principal + tt/_HOST@${local.realm} + + TT principal. + + + + + + hadoop.job.history.user.location + none + + + + mapreduce.jobtracker.keytab.file + /etc/security/keytabs/jt.service.keytab + + The keytab for the jobtracker principal. + + + + + mapreduce.tasktracker.keytab.file + /etc/security/keytabs/tt.service.keytab + The filename of the keytab for the task tracker + + + + mapreduce.jobtracker.staging.root.dir + /user + The Path prefix for where the staging directories should be + placed. The next level is always the user's + name. It is a path in the default file system. + + + + + + mapreduce.job.acl-modify-job + - hadoop.tmp.dir - /tmp + mapreduce.job.acl-view-job + Dr.Who + mapreduce.tasktracker.group + ${HADOOP_GROUP} + The group that the task controller uses for accessing the + task controller. The mapred user must be a member and users should *not* + be members. + + + + + mapred.acls.enabled + true + + + mapred.jobtracker.taskScheduler - ${HADOOP_TASK_SCHEDULER} + org.apache.hadoop.mapred.CapacityTaskScheduler + + + mapred.queue.names + default + + + + + mapreduce.history.server.embedded + false + + + mapreduce.history.server.http.address + ${HADOOP_JT_HOST}:51111 + + + mapreduce.jobhistory.kerberos.principal + jt/_HOST@${local.realm} + history server principal + + + mapreduce.jobhistory.keytab.file + /etc/security/keytabs/jt.service.keytab + + The keytab for the jobtracker principal. + + + + + mapred.hosts + ${HADOOP_CONF_DIR}/mapred.include + Names a file that contains the list of nodes that may + connect to the jobtracker. If the value is empty, all hosts are + permitted. + + + + mapred.hosts.exclude + ${HADOOP_CONF_DIR}/mapred.exclude + Names a file that contains the list of hosts that + should be excluded by the jobtracker. If the value is empty, no + hosts are excluded. + + + mapred.jobtracker.retirejob.check + 10000 + + + mapred.jobtracker.retirejob.interval + 0 Added: hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/taskcontroller.cfg URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/taskcontroller.cfg?rev=1169567&view=auto ============================================================================== --- hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/taskcontroller.cfg (added) +++ hadoop/common/branches/branch-0.20-security/src/packages/templates/conf/taskcontroller.cfg Sun Sep 11 22:53:52 2011 @@ -0,0 +1,3 @@ +mapreduce.cluster.local.dir=${HADOOP_MAPRED_DIR} +mapreduce.tasktracker.group=${HADOOP_GROUP} +hadoop.log.dir=${HADOOP_LOG_DIR}/${HADOOP_MR_USER}