hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (MAPREDUCE-5903) If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase
Date Sat, 09 May 2015 00:54:00 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli resolved MAPREDUCE-5903.
------------------------------------------------
    Resolution: Invalid

My comment [above|https://issues.apache.org/jira/browse/MAPREDUCE-5903?focusedCommentId=14152049&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14152049]
still holds. This is not supported today.

If you don't want to create user-accounts, then you can do the following
 - Find a local unix user to map all kerberos/LDAP authenticated users to
 - Set yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users to true
 - Set yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user configuration to
the local unix user.

For e.g., the default for this is nobody, which means all jobs will run as the nobody unix
user. Clearly this will have other security concerns as all jobs run as the same user.

Closing this as invalid for now.

> If Kerberos Authentication is enabled, MapReduce job is failing on reducer phase
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5903
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5903
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>         Environment: hadoop: 2.4.0.2.1.2.0
>            Reporter: Victor Kim
>            Priority: Critical
>              Labels: shuffle
>
> I have 3-node cluster configuration: 1 ResourceManager and 3 NodeManagers, Kerberos is
enabled, have hdfs, yarn, mapred principals\keytabs. ResourceManager and NodeManager are ran
under yarn user, using yarn Kerberos principal. 
> Use case 1: WordCount, submit job using yarn UGI (i.e. superuser, the one having Kerberos
principal on all boxes). Result: job successfully completed.
> Use case 2: WordCount, submit job using LDAP user impersonation via yarn UGI. Result:
Map tasks are completed SUCCESSfully, Reduce task fails with ShuffleError Caused by: java.io.IOException:
Exceeded MAX_FAILED_UNIQUE_FETCHES (see the stack trace below).
> The use case with user impersonation used to work on earlier versions, without YARN (with
JT&TT).
> I found similar issue with Kerberos AUTH involved here: https://groups.google.com/forum/#!topic/nosql-databases/tGDqs75ACqQ
> And here https://issues.apache.org/jira/browse/MAPREDUCE-4030 it's marked as resolved,
which is not the case when Kerberos Authentication is enabled.
> The exception trace from YarnChild JVM:
> 2014-05-21 12:49:35,687 FATAL [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl:
Shuffle failed with too many fetch failures and insufficient progress!
> 2014-05-21 12:49:35,688 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in
fetcher#3
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:416)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
>         at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
>         at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
>         at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
>         at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message