hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erravelli, Venkat" <venkat.errave...@baml.com>
Subject submitting a mapreduce job to remote cluster
Date Wed, 28 Nov 2012 15:53:37 GMT
Hello :

I see the below exception when I submit a MapReduce Job from standalone java application to
a remote Hadoop cluster. Cluster authentication mechanism is Kerberos.

Below is the code. I am using user impersonation since I need to submit the job as a hadoop
cluster user (userx) from my machine, on which I am logged is as user99. So:

userx -- user that is setup on the hadoop cluster.
user99 -- user on whoes machine the standalone java application code is executing.

                    System.setProperty("HADOOP_USER_NAME", "userx");

            final Configuration conf = new Configuration();

            conf.set("hadoop.security.auth_to_local",
                        "RULE:[1:$1@$0](.*@\\Q\\E$)s/@\\Q\\E$//<mailto:.*@\\Q\\E$)s/@\\Q\\E$//>"
                                    + "RULE:[2:$1@$0](.*@\\Q\\E$)s/@\\Q\\E$//<mailto:.*@\\Q\\E$)s/@\\Q\\E$//>"
+ "DEFAULT");

            conf.set("mapred.job.tracker", "abcde.yyyy.com:9921");

            conf.set("fs.defaultFS", "hdfs://xxxxx.yyyy.com:9920");

            UserGroupInformation.setConfiguration(conf);

            System.out.println("here ::::: "+ UserGroupInformation.getCurrentUser());

UserGroupInformation ugi = UserGroupInformation.createProxyUser("user99", UserGroupInformation.getCurrentUser());
            AuthenticationMethod am = AuthenticationMethod.KERBEROS;
            ugi.setAuthenticationMethod(am);


            final Path inPath = new Path("/user/userx/test.txt");

            DateFormat df = new SimpleDateFormat("dd_MM_yyyy_hh_mm");
            StringBuilder sb = new StringBuilder();
            sb.append("wordcount_result_").append(df.format(new Date()));

            // out
            final Path outPath = new Path(sb.toString());

            ugi.doAs(new PrivilegedExceptionAction<UserGroupInformation>() {   <<<<---------throws
exception here!!!

                  public UserGroupInformation run() throws Exception {
                        // Submit a job
                        // create a new job based on the configuration
                        Job job = new Job(conf, "word count remote");

                        job.setJarByClass(WordCountJob.class);
                        job.setMapperClass(TokenizerMapper.class);
                        job.setCombinerClass(IntSumReducer.class);
                        job.setReducerClass(IntSumReducer.class);
                        job.setOutputKeyClass(Text.class);
                        job.setOutputValueClass(IntWritable.class);
                        FileInputFormat.addInputPath(job, inPath);
                        FileOutputFormat.setOutputPath(job, outPath);

                        // this waits until the job completes
                        job.waitForCompletion(true);

                        if (job.isSuccessful()) {
                              System.out.println("Job completed successfully");
                        } else {
                              System.out.println("Job Failed");
                        }
                        return UserGroupInformation.getCurrentUser();

                  }
            });

When the above code is executed, I get the below exception on the line mentioned in the code
above:
***************
12/11/28 09:43:51 ERROR security.UserGroupInformation: PriviledgedActionException as: user99
(auth:KERBEROS) via userx (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Authorization (hadoop.security.authorization) is enabled but authentication (hadoop.security.authentication)
is configured as simple. Please configure another method like kerberos or digest.
Exception in thread "Main Thread" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Authorization (hadoop.security.authorization) is enabled but authentication (hadoop.security.authentication)
is configured as simple. Please configure another method like kerberos or digest.
***************
Can someone tell me/point me in the right direction on what is going on here, and how do i
get over this exception? Any help will be greatly appreciated. thanks!

Below are the hadoop cluster configuration files:

***************
Core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera CM on 2012-11-06T20:18:31.456Z-->
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://xxxxx.yyyy.com:9920</value>
  </property>
  <property>
    <name>io.file.buffer.size</name>
    <value>65536</value>
  </property>
  <property>
    <name>io.compression.codecs</name>
    <value></value>
  </property>
  <property>
    <name>hadoop.security.authentication</name>
    <value>kerberos</value>
  </property>
  <property>
    <name>hadoop.security.auth_to_local</name>
    <value>RULE:[1:$1@$0](.*@\Q\E$)s/@\Q\E$//<mailto:.*@\Q\E$)s/@\Q\E$//>
RULE:[2:$1@$0](.*@\Q\E$)s/@\Q\E$//<mailto:.*@\Q\E$)s/@\Q\E$//>
DEFAULT</value>
  </property>
</configuration>


Hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera CM on 2012-11-06T20:18:31.467Z-->
<configuration>
  <property>
    <name>dfs.https.address</name>
    <value>xxxxx.yyyy.com:50470</value>
  </property>
  <property>
    <name>dfs.https.port</name>
    <value>50470</value>
  </property>
  <property>
    <name>dfs.namenode.http-address</name>
    <value>xxxxx.yyyy.com:50070</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.blocksize</name>
    <value>134217728</value>
  </property>
  <property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.block.access.token.enable</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.namenode.kerberos.principal</name>
    <value>hdfs/_HOST@RND.HDFS.COM</value<mailto:hdfs/_HOST@RND.HDFS.COM%3c/value>>
  </property>
  <property>
    <name>dfs.namenode.kerberos.https.principal</name>
    <value>host/_HOST@RND.HDFS.COM</value<mailto:host/_HOST@RND.HDFS.COM%3c/value>>
  </property>
  <property>
    <name>dfs.namenode.kerberos.internal.spnego.principal</name>
    <value>HTTP/_HOST@RND.HDFS.COM</value<mailto:HTTP/_HOST@RND.HDFS.COM%3c/value>>
  </property>
</configuration>


Mapred-site.xml


<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera CM on 2012-11-06T20:18:31.456Z-->
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>abcde.yyyy.com:9921</value>
  </property>
  <property>
    <name>mapred.output.compress</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.output.compression.type</name>
    <value>BLOCK</value>
  </property>
  <property>
    <name>mapred.output.compression.codec</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec</value>
  </property>
  <property>
    <name>mapred.map.output.compression.codec</name>
    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
  </property>
  <property>
    <name>mapred.compress.map.output</name>
    <value>true</value>
  </property>
  <property>
    <name>io.sort.factor</name>
    <value>64</value>
  </property>
  <property>
    <name>io.sort.record.percent</name>
    <value>0.05</value>
  </property>
  <property>
    <name>io.sort.spill.percent</name>
    <value>0.8</value>
  </property>
  <property>
    <name>mapred.reduce.parallel.copies</name>
    <value>10</value>
  </property>
  <property>
    <name>mapred.submit.replication</name>
    <value>10</value>
  </property>
  <property>
    <name>mapred.reduce.tasks</name>
    <value>72</value>
  </property>
  <property>
    <name>io.sort.mb</name>
    <value>256</value>
  </property>
  <property>
    <name>mapred.child.java.opts</name>
    <value> -Xmx1073741824</value>
  </property>
  <property>
    <name>mapred.job.reuse.jvm.num.tasks</name>
    <value>1</value>
  </property>
  <property>
    <name>mapred.map.tasks.speculative.execution</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.reduce.tasks.speculative.execution</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.reduce.slowstart.completed.maps</name>
    <value>1.0</value>
  </property>
  <property>
    <name>mapreduce.jobtracker.kerberos.principal</name>
    <value>mapred/_HOST@RND.HDFS.COM</value<mailto:mapred/_HOST@RND.HDFS.COM%3c/value>>
  </property>
  <property>
    <name>mapreduce.jobtracker.kerberos.https.principal</name>
    <value>host/_HOST@RND.HDFS.COM</value<mailto:host/_HOST@RND.HDFS.COM%3c/value>>
  </property>
</configuration>


***************

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information
that is privileged, confidential and/or proprietary and subject to important terms and conditions
available at http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended recipient,
please delete this message.

Mime
View raw message