hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "FKorning (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7682) taskTracker could not start because "Failed to set permissions" to "ttprivate to 0700"
Date Fri, 23 Mar 2012 14:59:29 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236645#comment-13236645
] 

FKorning commented on HADOOP-7682:
----------------------------------


There's a bunch of issues at work.  I've patched this up locally
on my own 1.0.2-SNAPSHOT, but it takes a lot of yak-shaving to fix.



--- 

First you need to set up hadoop-1.0.1 including source, ant, ivy,
and cygwin with ssh/ssl and tcp_wrappers.

Then use sshd_config to create a cyg_server priviledged user.
>From an admin cygwin shell, you then have to edit the /etc/passwd
file and give that user a valid shell and user home, change the
password for the user, and finally generate ssh keys for the user
and copy the user's id_rsa.pub public key into ~/.ssh/authorized_keys.

if done right you should be able to ssh cyg_server@localhost.


--- 

Now the main problem is a confusion between the hadoop shell scripts
that expect unix paths like /tmp, and the haddop java binaries who
interpret this path as C:\tmp.

Unfortunately, neither Cygwin symlinks nor even Windows NT Junctions
are supported by the java io filesystem.  Thus the only way to get
around this is to enforce the cygwin paths to be identical to windows
paths.

I get around this by creating a circular symlink in "/cygwin" -> "/".
To avoid confusion with "C:" drive mappings, all my paths are relative.
This means that windows "\cygwin\tmp" equals cygwin's "/cygwin/tmp".

For pid files use /cygwin/tmp/
For tmp file  use /cygwin/tmp/haddop-${USER}/
For log files use /cygwin/tmp/haddop-${USER}/logs/


--- 

First the ssh slaves invocation warpper is broken because it fails to
provide the user's ssh login, which isn't defaulted to in cygwin openssh.


slaves.sh:

for slave in `cat "$HOSTLIST"|sed  "s/#.*$//;/^$/d"`; do
 ssh -l $USER $HADOOP_SSH_OPTS $slave $"${@// /\\ }" \
   2>&1 | sed "s/^/$slave: /" &
 if [ "$HADOOP_SLAVE_SLEEP" != "" ]; then
   sleep $HADOOP_SLAVE_SLEEP
 fi
done


Next the hadoop shell scripts are broken.  you need to fix the environments
for cygwin paths in hadoop-env.sh, and then make sure this file is invoked
by both hadoop-config.sh, and finally the hadoop* sh wrapper script. For me
its JRE java invocation was also broken, so I provide the whole srcript below.


hadoop-env.sh:

  HADOOP_PID_DIR=/cygwin/tmp/
  HADOOP_TMP_DIR=/cygwin/tmp/hadoop-${USER}
  HADOOP_LOG_DIR=/cygwin/tmp/hadoop-${USER}/logs



hadoop (sh):


#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


# The Hadoop command script
#
# Environment Variables
#
#   JAVA_HOME        The java implementation to use.  Overrides JAVA_HOME.
#
#   HADOOP_CLASSPATH Extra Java CLASSPATH entries.
#
#   HADOOP_USER_CLASSPATH_FIRST      When defined, the HADOOP_CLASSPATH is 
#                                    added in the beginning of the global
#                                    classpath. Can be defined, for example,
#                                    by doing 
#                                    export HADOOP_USER_CLASSPATH_FIRST=true
#
#   HADOOP_HEAPSIZE  The maximum amount of heap to use, in MB. 
#                    Default is 1000.
#
#   HADOOP_OPTS      Extra Java runtime options.
#   
#   HADOOP_NAMENODE_OPTS       These options are added to HADOOP_OPTS 
#   HADOOP_CLIENT_OPTS         when the respective command is run.
#   HADOOP_{COMMAND}_OPTS etc  HADOOP_JT_OPTS applies to JobTracker 
#                              for e.g.  HADOOP_CLIENT_OPTS applies to 
#                              more than one command (fs, dfs, fsck, 
#                              dfsadmin etc)  
#
#   HADOOP_CONF_DIR  Alternate conf dir. Default is ${HADOOP_HOME}/conf.
#
#   HADOOP_ROOT_LOGGER The root appender. Default is INFO,console
#

bin=`dirname "$0"`
bin=`cd "$bin"; pwd`

cygwin=false
case "`uname`" in
CYGWIN*) cygwin=true;;
esac


if [ -e "$bin"/../libexec/hadoop-config.sh ]; then
  . "$bin"/../libexec/hadoop-config.sh
else
  . "$bin"/hadoop-config.sh
fi


# if no args specified, show usage
if [ $# = 0 ]; then
  echo "Usage: hadoop [--config confdir] COMMAND"
  echo "where COMMAND is one of:"
  echo "  namenode -format     format the DFS filesystem"
  echo "  secondarynamenode    run the DFS secondary namenode"
  echo "  namenode             run the DFS namenode"
  echo "  datanode             run a DFS datanode"
  echo "  dfsadmin             run a DFS admin client"
  echo "  mradmin              run a Map-Reduce admin client"
  echo "  fsck                 run a DFS filesystem checking utility"
  echo "  fs                   run a generic filesystem user client"
  echo "  balancer             run a cluster balancing utility"
  echo "  fetchdt              fetch a delegation token from the NameNode"
  echo "  jobtracker           run the MapReduce job Tracker node" 
  echo "  pipes                run a Pipes job"
  echo "  tasktracker          run a MapReduce task Tracker node" 
  echo "  historyserver        run job history servers as a standalone daemon"
  echo "  job                  manipulate MapReduce jobs"
  echo "  queue                get information regarding JobQueues" 
  echo "  version              print the version"
  echo "  jar <jar>            run a jar file"
  echo "  distcp <srcurl> <desturl> copy file or directories recursively"
  echo "  archive -archiveName NAME -p <parent path> <src>* <dest> create
a hadoop archive"
  echo "  classpath            prints the class path needed to get the"
  echo "                       Hadoop jar and the required libraries"
  echo "  daemonlog            get/set the log level for each daemon"
  echo " or"
  echo "  CLASSNAME            run the class named CLASSNAME"
  echo "Most commands print help when invoked w/o parameters."
  exit 1
fi

# get arguments
COMMAND=$1
shift

# Determine if we're starting a secure datanode, and if so, redefine appropriate variables
if [ "$COMMAND" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER"
]; then
  HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR
  HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR
  HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER
  starting_secure_dn="true"
fi

if [ "$JAVA_HOME" != "" ]; then
  #echo "JAVA_HOME: $JAVA_HOME"
  JAVA_HOME="$JAVA_HOME"
fi
# some Java parameters
if $cygwin; then
  JAVA_HOME=`cygpath -w "$JAVA_HOME"`
  #echo "cygwin JAVA_HOME: $JAVA_HOME"  
fi
  if [ "$JAVA_HOME" == "" ]; then
  echo "Error: JAVA_HOME is not set: $JAVA_HOME"
  exit 1
fi

JAVA=$JAVA_HOME/bin/java
JAVA_HEAP_MAX=-Xmx1000m 

# check envvars which might override default args
if [ "$HADOOP_HEAPSIZE" != "" ]; then
  #echo "run with heapsize $HADOOP_HEAPSIZE"
  JAVA_HEAP_MAX="-Xmx""$HADOOP_HEAPSIZE""m"
  #echo $JAVA_HEAP_MAX
fi

# CLASSPATH initially contains $HADOOP_CONF_DIR
CLASSPATH="${HADOOP_CONF_DIR}"
if [ "$HADOOP_USER_CLASSPATH_FIRST" != "" ] && [ "$HADOOP_CLASSPATH" != "" ] ; then
  CLASSPATH=${CLASSPATH}:${HADOOP_CLASSPATH}
fi
CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar

# for developers, add Hadoop classes to CLASSPATH
if [ -d "$HADOOP_HOME/build/classes" ]; then
  CLASSPATH=${CLASSPATH}:$HADOOP_HOME/build/classes
fi
if [ -d "$HADOOP_HOME/build/webapps" ]; then
  CLASSPATH=${CLASSPATH}:$HADOOP_HOME/build
fi
if [ -d "$HADOOP_HOME/build/test/classes" ]; then
  CLASSPATH=${CLASSPATH}:$HADOOP_HOME/build/test/classes
fi
if [ -d "$HADOOP_HOME/build/tools" ]; then
  CLASSPATH=${CLASSPATH}:$HADOOP_HOME/build/tools
fi

# so that filenames w/ spaces are handled correctly in loops below
IFS=

# for releases, add core hadoop jar & webapps to CLASSPATH
if [ -e $HADOOP_PREFIX/share/hadoop/hadoop-core-* ]; then
  # binary layout
  if [ -d "$HADOOP_PREFIX/share/hadoop/webapps" ]; then
    CLASSPATH=${CLASSPATH}:$HADOOP_PREFIX/share/hadoop
  fi
  for f in $HADOOP_PREFIX/share/hadoop/hadoop-core-*.jar; do
    CLASSPATH=${CLASSPATH}:$f;
  done

  # add libs to CLASSPATH
  for f in $HADOOP_PREFIX/share/hadoop/lib/*.jar; do
    CLASSPATH=${CLASSPATH}:$f;
  done

  for f in $HADOOP_PREFIX/share/hadoop/lib/jsp-2.1/*.jar; do
    CLASSPATH=${CLASSPATH}:$f;
  done

  for f in $HADOOP_PREFIX/share/hadoop/hadoop-tools-*.jar; do
    TOOL_PATH=${TOOL_PATH}:$f;
  done
else
  # tarball layout
  if [ -d "$HADOOP_HOME/webapps" ]; then
    CLASSPATH=${CLASSPATH}:$HADOOP_HOME
  fi
  for f in $HADOOP_HOME/hadoop-core-*.jar; do
    CLASSPATH=${CLASSPATH}:$f;
  done

  # add libs to CLASSPATH
  for f in $HADOOP_HOME/lib/*.jar; do
    CLASSPATH=${CLASSPATH}:$f;
  done

  if [ -d "$HADOOP_HOME/build/ivy/lib/Hadoop/common" ]; then
    for f in $HADOOP_HOME/build/ivy/lib/Hadoop/common/*.jar; do
      CLASSPATH=${CLASSPATH}:$f;
    done
  fi

  for f in $HADOOP_HOME/lib/jsp-2.1/*.jar; do
    CLASSPATH=${CLASSPATH}:$f;
  done

  for f in $HADOOP_HOME/hadoop-tools-*.jar; do
    TOOL_PATH=${TOOL_PATH}:$f;
  done
  for f in $HADOOP_HOME/build/hadoop-tools-*.jar; do
    TOOL_PATH=${TOOL_PATH}:$f;
  done
fi

# add user-specified CLASSPATH last
if [ "$HADOOP_USER_CLASSPATH_FIRST" = "" ] && [ "$HADOOP_CLASSPATH" != "" ]; then
  CLASSPATH=${CLASSPATH}:${HADOOP_CLASSPATH}
fi

# default log directory & file
if [ "$HADOOP_LOG_DIR" = "" ]; then
  HADOOP_LOG_DIR="$HADOOP_HOME/logs"
fi
if [ "$HADOOP_LOGFILE" = "" ]; then
  HADOOP_LOGFILE='hadoop.log'
fi

# default policy file for service-level authorization
if [ "$HADOOP_POLICYFILE" = "" ]; then
  HADOOP_POLICYFILE="hadoop-policy.xml"
fi

# restore ordinary behaviour
unset IFS

# figure out which class to run
if [ "$COMMAND" = "classpath" ] ; then
  if $cygwin; then
    CLASSPATH=`cygpath -wp "$CLASSPATH"`
  fi
  echo $CLASSPATH
  exit
elif [ "$COMMAND" = "namenode" ] ; then
  CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode'
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS"
elif [ "$COMMAND" = "secondarynamenode" ] ; then
  CLASS='org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode'
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_SECONDARYNAMENODE_OPTS"
elif [ "$COMMAND" = "datanode" ] ; then
  CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
  if [ "$starting_secure_dn" = "true" ]; then
    HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
  else
    HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
  fi
elif [ "$COMMAND" = "fs" ] ; then
  CLASS=org.apache.hadoop.fs.FsShell
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "dfs" ] ; then
  CLASS=org.apache.hadoop.fs.FsShell
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "dfsadmin" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "mradmin" ] ; then
  CLASS=org.apache.hadoop.mapred.tools.MRAdmin
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "fsck" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.DFSck
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "balancer" ] ; then
  CLASS=org.apache.hadoop.hdfs.server.balancer.Balancer
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_BALANCER_OPTS"
elif [ "$COMMAND" = "fetchdt" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.DelegationTokenFetcher
elif [ "$COMMAND" = "jobtracker" ] ; then
  CLASS=org.apache.hadoop.mapred.JobTracker
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_JOBTRACKER_OPTS"
elif [ "$COMMAND" = "historyserver" ] ; then
  CLASS=org.apache.hadoop.mapred.JobHistoryServer
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_JOB_HISTORYSERVER_OPTS"
elif [ "$COMMAND" = "tasktracker" ] ; then
  CLASS=org.apache.hadoop.mapred.TaskTracker
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_TASKTRACKER_OPTS"
elif [ "$COMMAND" = "job" ] ; then
  CLASS=org.apache.hadoop.mapred.JobClient
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "queue" ] ; then
  CLASS=org.apache.hadoop.mapred.JobQueueClient
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "pipes" ] ; then
  CLASS=org.apache.hadoop.mapred.pipes.Submitter
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "version" ] ; then
  CLASS=org.apache.hadoop.util.VersionInfo
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "jar" ] ; then
  CLASS=org.apache.hadoop.util.RunJar
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "distcp" ] ; then
  CLASS=org.apache.hadoop.tools.DistCp
  CLASSPATH=${CLASSPATH}:${TOOL_PATH}
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "daemonlog" ] ; then
  CLASS=org.apache.hadoop.log.LogLevel
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "archive" ] ; then
  CLASS=org.apache.hadoop.tools.HadoopArchives
  CLASSPATH=${CLASSPATH}:${TOOL_PATH}
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "sampler" ] ; then
  CLASS=org.apache.hadoop.mapred.lib.InputSampler
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
else
  CLASS=$COMMAND
fi


# cygwin path translation
if $cygwin; then
  JAVA_HOME=`cygpath -w "$JAVA_HOME"`
  CLASSPATH=`cygpath -wp "$CLASSPATH"`
  HADOOP_HOME=`cygpath -w "$HADOOP_HOME"`
  HADOOP_LOG_DIR=`cygpath -w "$HADOOP_LOG_DIR"`
  TOOL_PATH=`cygpath -wp "$TOOL_PATH"`
fi

# setup 'java.library.path' for native-hadoop code if necessary
JAVA_LIBRARY_PATH=''


if [ -d "${HADOOP_HOME}/build/native" -o -d "${HADOOP_HOME}/lib/native" -o -e "${HADOOP_PREFIX}/lib/libhadoop.a"
]; then
  JAVA_PLATFORM=`${JAVA} -classpath ${CLASSPATH} -Xmx32m ${HADOOP_JAVA_PLATFORM_OPTS} org.apache.hadoop.util.PlatformName
| sed -e "s/ /_/g"`
  #echo "JAVA_PLATFORM: $JAVA_PLATFORM"
  
  if [ "$JAVA_PLATFORM" = "Windows_7-amd64-64" ]; then
    JSVC_ARCH="amd64"
  elif [ "$JAVA_PLATFORM" = "Linux-amd64-64" ]; then
    JSVC_ARCH="amd64"
  else
    JSVC_ARCH="i386"
  fi

  if [ -d "$HADOOP_HOME/build/native" ]; then
    JAVA_LIBRARY_PATH=${HADOOP_HOME}/build/native/${JAVA_PLATFORM}/lib
  fi
  
  if [ -d "${HADOOP_HOME}/lib/native" ]; then
    if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
      JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:${HADOOP_HOME}/lib/native/${JAVA_PLATFORM}
    else
      JAVA_LIBRARY_PATH=${HADOOP_HOME}/lib/native/${JAVA_PLATFORM}
    fi
  fi

  if [ -e "${HADOOP_PREFIX}/lib/libhadoop.a" ]; then
    JAVA_LIBRARY_PATH=${HADOOP_PREFIX}/lib
  fi
fi

# cygwin path translation
if $cygwin; then
  JAVA_LIBRARY_PATH=`cygpath -wp "$JAVA_LIBRARY_PATH"`
  PATH="/cygwin/bin:/cygwin/usr/bin:`cygpath -p ${PATH}`"
fi

HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.tmp.dir=$HADOOP_TMP_DIR"
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR"
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.file=$HADOOP_LOGFILE"
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.home.dir=$HADOOP_HOME"
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING"
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.root.logger=${HADOOP_ROOT_LOGGER:-INFO,console}"

#turn security logger on the namenode and jobtracker only
if [ $COMMAND = "namenode" ] || [ $COMMAND = "jobtracker" ]; then
  HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,DRFAS}"
else
  HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,NullAppender}"
fi

if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
  HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"
fi  
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.policy.file=$HADOOP_POLICYFILE"

# Check to see if we should start a secure datanode
if [ "$starting_secure_dn" = "true" ]; then
  if [ "$HADOOP_PID_DIR" = "" ]; then
    HADOOP_SECURE_DN_PID="/tmp/hadoop_secure_dn.pid"
  else
    HADOOP_SECURE_DN_PID="$HADOOP_PID_DIR/hadoop_secure_dn.pid"
  fi

  exec "$HADOOP_HOME/libexec/jsvc.${JSVC_ARCH}" -Dproc_$COMMAND -outfile "$HADOOP_LOG_DIR/jsvc.out"
\
                                                -errfile "$HADOOP_LOG_DIR/jsvc.err" \
                                                -pidfile "$HADOOP_SECURE_DN_PID" \
                                                -nodetach \
                                                -user "$HADOOP_SECURE_DN_USER" \
                                                -cp "$CLASSPATH" \
                                                $JAVA_HEAP_MAX $HADOOP_OPTS \
                                                org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter
"$@"
else
  # run it
  exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS -classpath "$CLASSPATH" $CLASS
"$@"
fi




----


Next the hadoop fs and utilities are broken, as they expect shells
with POSIX /bin executables in their path (bash,chmod,chown,chgrp)
For various reasons it's a real bad idea to add "/cygwin/bin" to your
windows path, so we're going to have to fix the utility classes to
be cygwin aware and use the "/cygwin/bin" binaries instead.

This is why you need the source, because we're going to have to fix
the java source and recompile the hadoop core libraries (and why you
need ant ivy).

----

Before we do this, the contrib Gridmix is broken as it uses a strange
generic Enum code that just craps out in jdk/jre 1.7 and above.
The fix is to dumb it down and use untyped Enums.


Gridmix.java:

/*  
  private <T> String getEnumValues(Enum<? extends T>[] e) {
    StringBuilder sb = new StringBuilder();
    String sep = "";
    for (Enum<? extends T> v : e) {
      sb.append(sep);
      sb.append(v.name());
      sep = "|";
    }
    return sb.toString();
  }
*/
  private String getEnumValues(Enum[] e) {
    StringBuilder sb = new StringBuilder();
    String sep = "";
    for (Enum v : e) {
      sb.append(sep);
      sb.append(v.name());
      sep = "|";
    }
    return sb.toString();
  }


---

next first the ivy build.xml and build-contrib scripts are broken,
as they fail to set the correct compiler javac.target=1.7 everywhere.

modify all of these to include the following in all javac targets:


build-contrib.xml:


  <property name="javac.debug" value="on"/>
  <property name="javac.version" value="1.7"/>

  ...

  <!-- ====================================================== -->
  <!-- Compile a Hadoop contrib's files                       -->
  <!-- ====================================================== -->
  <target name="compile" depends="init, ivy-retrieve-common" unless="skip.contrib">
    <echo message="contrib: ${name}"/>
    <javac
     encoding="${build.encoding}"
     srcdir="${src.dir}"
     includes="**/*.java"
     destdir="${build.classes}"
     target="${javac.version}"
     source="${javac.version}"
     optimize="${javac.optimize}"
     debug="${javac.debug}"
     deprecation="${javac.deprecation}">
     <classpath refid="contrib-classpath"/>
    </javac>
  </target>

---

Next we fix the hadoop utilities Shell.java to use cygwin paths:


Shell.java:

  /** Set to true on Windows platforms */
  public static final boolean WINDOWS /* borrowed from Path.WINDOWS */
                = System.getProperty("os.name").startsWith("Windows");
  
  /** a Unix command to get the current user's name */
  public final static String USER_NAME_COMMAND = (WINDOWS ? "/cygwin/bin/whoami" : "whoami");
  
  /** a Unix command to get the current user's groups list */
  public static String[] getGroupsCommand() {
    return new String[]{ (WINDOWS ? "/cygwin/bin/bash" : "bash"), "-c", "groups"};
  }
  
  /** a Unix command to get a given user's groups list */
  public static String[] getGroupsForUserCommand(final String user) {
    //'groups username' command return is non-consistent across different unixes
    return new String [] {(WINDOWS ? "/cygwin/bin/bash" : "bash"), "-c", "id -Gn " + user};
  }
  
  /** a Unix command to get a given netgroup's user list */
  public static String[] getUsersForNetgroupCommand(final String netgroup) {
    //'groups username' command return is non-consistent across different unixes
    return new String [] {(WINDOWS ? "/cygwin/bin/bash" : "bash"), "-c", "getent netgroup
" + netgroup};
  }

  
  /** Return a Unix command to get permission information. */
  public static String[] getGET_PERMISSION_COMMAND() {
    //force /bin/ls, except on windows.
    return new String[] {(WINDOWS ? "/cygwin/bin/ls" : "/bin/ls"), "-ld"};
  }
  
  
  /** a Unix command to set permission */
  public static final String SET_PERMISSION_COMMAND = (WINDOWS ? "/cygwin/bin/chmod" : "chmod");
  
  /** a Unix command to set owner */
  public static final String SET_OWNER_COMMAND = (WINDOWS ? "/cygwin/bin/chown" : "chown");

  /** a Unix command to set group */
  public static final String SET_GROUP_COMMAND = (WINDOWS ? "/cygwin/bin/chgrp" : "chgrp");

  /** a Unix command to get ulimit of a process. */
  public static final String ULIMIT_COMMAND = "ulimit";


----

Lastly and despite this fix, hadoop filesystem's FileUtil complains
about RawLocalFileSystem, breaking during the directory creation and
verification because the shell's return value is improperly parsed.

You can fix this in a number of ways.  I took the lazy approach and
just made all mkdir functions catch all IOExceptions silently.


RawLocalFileSystem.java:

  /**
   * Creates the specified directory hierarchy. Does not
   * treat existence as an error.
   */
  public boolean mkdirs(Path f) throws IOException {
	boolean b = false;
	try {
      Path parent = f.getParent();
      File p2f = pathToFile(f);
      b = (parent == null || mkdirs(parent))
       && (p2f.mkdir() || p2f.isDirectory());
	} catch (IOException e) {}
	return b;
  }

  /** {@inheritDoc} */
  @Override
  public boolean mkdirs(Path f, FsPermission permission) throws IOException {
	boolean b = false;
	try {
      b = mkdirs(f);
      setPermission(f, permission);
	} catch (IOException e) {}
    return b;
  }


---


Finally, rebuild hadoop with "ant -f build.xml compile".
copy the jars in the build directory oevrwriting the
existing jars in the hadoop home parent directory.

reformat the namenode.

and run start-all.sh.


you should see 4 java processes for the namenode, datanode,
jobtracker, and tasktracker.  that was a lot of yak shaving
just to get this running.





                
> taskTracker could not start because "Failed to set permissions" to "ttprivate to 0700"
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7682
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7682
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.20.203.0, 0.20.205.0, 1.0.0
>         Environment: OS:WindowsXP SP3 , Filesystem :NTFS, cygwin 1.7.9-1, jdk1.6.0_05
>            Reporter: Magic Xie
>
> ERROR org.apache.hadoop.mapred.TaskTracker:Can not start task tracker because java.io.IOException:Failed
to set permissions of path:/tmp/hadoop-cyg_server/mapred/local/ttprivate to 0700
>     at org.apache.hadoop.fs.RawLocalFileSystem.checkReturnValue(RawLocalFileSystem.java:525)
>     at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:499)
>     at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318)
>     at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)
>     at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:635)
>     at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1328)
>     at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3430)
> Since hadoop0.20.203 when the TaskTracker initialize, it checks the permission(TaskTracker
Line 624) of (org.apache.hadoop.mapred.TaskTracker.TT_LOG_TMP_DIR,org.apache.hadoop.mapred.TaskTracker.TT_PRIVATE_DIR,
org.apache.hadoop.mapred.TaskTracker.TT_PRIVATE_DIR).RawLocalFileSystem(http://svn.apache.org/viewvc/hadoop/common/tags/release-0.20.203.0/src/core/org/apache/hadoop/fs/RawLocalFileSystem.java?view=markup)
call setPermission(Line 481) to deal with it, setPermission works fine on *nx, however,it
dose not alway works on windows.
> setPermission call setReadable of Java.io.File in the line 498, but according to the
Table1 below provided by oracle,setReadable(false) will always return false on windows, the
same as setExecutable(false).
> http://java.sun.com/developer/technicalArticles/J2SE/Desktop/javase6/enhancements/
> is it cause the task tracker "Failed to set permissions" to "ttprivate to 0700"?
> Hadoop 0.20.202 works fine in the same environment. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message