hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit...@gmail.com>
Subject Re: Shuffle Error after enabling Kerberos authentication
Date Sun, 20 Apr 2014 03:41:08 GMT
(bump) this is a good question.

im new to kerberos as well, and have been wondering how to prevent
scenarios such as this from happening.....

my thought is that since Kerberos iirc requires a ticket for each pair of
client + services  working together  ... maybe there is a chance that,  if
*any* two nodes in a cluster havent been initialized with the right tickets
to talk together, then a possible error can happen during shuffle-sort b/c
so much distributed copying is going on ???

In any case, id love to know any good smoke tests for a large size
kerberized hadoop cluster .... that dont require running a mapreduce job.



On Sat, Apr 19, 2014 at 11:11 PM, Mike <mike@unitedrmr.com> wrote:

> Unsubscribe
>
> > On Apr 19, 2014, at 5:32 AM, Terance Dias <terance.dias@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I'm using apache hadoop-2.1.0-beta. I'm able to set up a basic
> multi-node cluster and run map reduce jobs. But when I enable Kerberos
> authentication, the reduce task fails with following error.
> >
> > Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError:
> error in shuffle in fetcher#1
> >       at
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:121)
> >       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380)
> >       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
> >       at java.security.AccessController.doPrivileged(Native Method)
> >       at javax.security.auth.Subject.doAs(Subject.java:396)
> >       at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
> >       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> > Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES;
> bailing-out.
> >       at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:311)
> >       at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:243)
> >       at
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
> >       at
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)
> >
> > I did a search and found that people have generally seen this error when
> their network configuration is not correct and so the data nodes are not
> able to communicate with each other to shuffle the data. I don't think that
> is the problem in my case because everything works fine if Kerberos
> authentication is disabled. Any idea what what the problem could be?
> >
> > Thanks,
> > Terance.
> >
>



-- 
Jay Vyas
http://jayunit100.blogspot.com

Mime
View raw message