hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "kumar ranganathan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase
Date Mon, 16 Feb 2015 10:58:13 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14322645#comment-14322645
] 

kumar ranganathan commented on MAPREDUCE-5903:
----------------------------------------------

I just found a solution for this problem and it might be the root cause. It seems a user who
submit a job should have admin privilege(user from active directory). 

NodeManager log says, 

Caused by: java.io.IOException: Owner 'Administrators' for path \tmp\hadoop-Seekay\nm-local-dir\usercache\Seekay\appcache\application_1423805493973_0004\output\at
tempt_1423805493973_0004_m_000000_1\file.out.index did not match expected owner
'Seekay'

the above exception throws in the below code

{code:title=SecureIOUtils.java|borderStyle=solid}
private static void checkStat(File f, String owner, String group, 
      String expectedOwner, 
      String expectedGroup) throws IOException {
		// Some code here......................
        UserGroupInformation ugi =
            UserGroupInformation.createRemoteUser(expectedOwner);
        final String adminsGroupString = "Administrators";
        success = owner.equals(adminsGroupString)
            && Arrays.asList(ugi.getGroupNames()).contains(adminsGroupString);
      } else {
        success = false;
      }
    // Some code here......................
  }
{code}

I just added the user to one of the admin group in active directory and then map reduce job
ran successfully.

> If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5903
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>         Environment: hadoop: 2.4.0.2.1.2.0
>            Reporter: Victor Kim
>            Priority: Critical
>              Labels: shuffle
>
> I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, Kerberos is
enabled, have hdfs, yarn, mapred principals\keytabs. ResourceManager and NodeManager are ran
under yarn user, using yarn Kerberos principal. 
> Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one having Kerberos
principal on all boxes). Result: job successfully completed.
> Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. Result:
Map tasks are completed SUCCESSfully, Reduce task fails with ShuffleError Caused by: java.io.IOException:
Exceeded MAX_FAILED_UNIQUE_FETCHES (see the stack trace below).
> The use case with user impersonation used to work on earlier versions, without YARN (with
JT&TT).
> I found similar issue with Kerberos AUTH involved here: https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ
> And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as resolved,
which is not the case when Kerberos Authentication is enabled.
> The exception trace from YarnChild JVM:
> 2014-05-21 12:49:35,687 FATAL [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl:
Shuffle failed with too many fetch failures and insufficient progress!
> 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in
fetcher#3
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:416)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
>         at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
>         at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
>         at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
>         at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message