Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D26A11F34 for ; Sat, 19 Apr 2014 12:32:59 +0000 (UTC) Received: (qmail 7921 invoked by uid 500); 19 Apr 2014 12:32:51 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 7417 invoked by uid 500); 19 Apr 2014 12:32:49 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 7408 invoked by uid 99); 19 Apr 2014 12:32:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Apr 2014 12:32:47 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of terance.dias@gmail.com designates 209.85.216.170 as permitted sender) Received: from [209.85.216.170] (HELO mail-qc0-f170.google.com) (209.85.216.170) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Apr 2014 12:32:42 +0000 Received: by mail-qc0-f170.google.com with SMTP id x13so2587185qcv.29 for ; Sat, 19 Apr 2014 05:32:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=4MDwBbT/nzvRohHIXKgSDncLcSPq1YV/c4Rm7VfuTKo=; b=o2/xyB9MuKAqhidZAYyVOuZsythcvHLCQ/vvuyGXjpWtAHs4AOHCdapt47/vfncePq CfSR9b9MxABfUmg8aGlkP2E9Qa+Rb2RxvFoDoePvYp16qYizNvbKHTtIVRvCoy5CcRKl YV5c8xIHQ2lz9LlQRgw808VNY+05GqP7aG+KU0EhZvR3vjo1K0W1orL55fNwWeWK+iw0 sFocLH+z31j907yyg/yMi5SEExCAMk2B3dETzRINWxloQ0xAMTdzirx+WwsqSctY5emT f7WXf6fK0JQ9c5KM49D/t3OgxQteaMhkzs33g/DMF9zgn46+tuAwTTMz5SclOtkSLy1X mhwA== MIME-Version: 1.0 X-Received: by 10.140.41.80 with SMTP id y74mr1848996qgy.104.1397910740030; Sat, 19 Apr 2014 05:32:20 -0700 (PDT) Received: by 10.229.251.66 with HTTP; Sat, 19 Apr 2014 05:32:20 -0700 (PDT) Date: Sat, 19 Apr 2014 18:02:20 +0530 Message-ID: Subject: Shuffle Error after enabling Kerberos authentication From: Terance Dias To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c13da21659ca04f7647563 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c13da21659ca04f7647563 Content-Type: text/plain; charset=UTF-8 Hi, I'm using apache hadoop-2.1.0-beta. I'm able to set up a basic multi-node cluster and run map reduce jobs. But when I enable Kerberos authentication, the reduce task fails with following error. Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:121) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:311) at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:243) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165) I did a search and found that people have generally seen this error when their network configuration is not correct and so the data nodes are not able to communicate with each other to shuffle the data. I don't think that is the problem in my case because everything works fine if Kerberos authentication is disabled. Any idea what what the problem could be? Thanks, Terance. --001a11c13da21659ca04f7647563 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

I'm using apache hadoop-2.1.0-b= eta. I'm able to set up a basic multi-node cluster and run map reduce j= obs. But when I enable Kerberos authentication, the reduce task fails with = following error.

Error: org.apache.hadoop.mapreduce.task.reduce.Shu= ffle$ShuffleError: error in shuffle in fetcher#1
at org.apache.hadoop.mapreduce.task.redu= ce.Shuffle.run(Shuffle.java:121)
at org.apache.hadoo= p.mapred.ReduceTask.run(ReduceTask.java:380)
at org.apache.hadoop.mapred.YarnChild$2.run(= YarnChild.java:162)
at java.security.Ac= cessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.ja= va:396)
at org.apache.hadoo= p.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
<= div> at org.apache.hadoop= .mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; ba= iling-out.
at = org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHe= alth(ShuffleSchedulerImpl.java:311)
at org.apache.hadoo= p.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImp= l.java:243)
at= org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:= 347)
at org.apache.hadoo= p.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)

<= /div>
I did a search and found that people have generally seen this err= or when their network configuration is not correct and so the data nodes ar= e not able to communicate with each other to shuffle the data. I don't = think that is the problem in my case because everything works fine if Kerbe= ros authentication is disabled. Any idea what what the problem could be?

Thanks,
Terance.=C2=A0

--001a11c13da21659ca04f7647563--