hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prashant Sharma <prashant.ii...@gmail.com>
Subject Re: Problem while submitting jobs to NM started with ephemeral ports.
Date Mon, 17 Oct 2011 09:05:30 GMT
also I tried commenting out two last two properties in yarn-site
mentioned above. And keeping the following property in mapred-site

    <property>
      <name> mapreduce.shuffle.port</name>
      <value>0</value>
    </property>

I got this exception while running a wordcount.

 mapreduce.Job (Job.java:printTaskEvents(1315)) - Task Id :
attempt_1318840789401_0005_r_000000_0, Status : FAILED
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in
shuffle in fetcher#5
	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:126)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:365)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
	at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
	at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:227)
	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)


And everything works out of the box otherwise.

Thanks,
Prashant.

On Mon, Oct 17, 2011 at 2:03 PM, Prashant Sharma
<prashant.iiith@gmail.com> wrote:
> I am using following properties in yarn-site
>
> <property>
> <name>yarn.nodemanager.aux-services</name>
> <value>mapreduce.shuffle</value>
> </property>
>  <property>
> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
> <value>org.apache.hadoop.mapred.ShuffleHandler</value>
> </property>
>  <property>
>    <name>yarn.nodemanager.address</name>
>    <value>localhost:0</value>
>  </property>
>  <property>
>    <name>yarn.nodemanager.localizer.address</name>
>    <value>localhost:0</value>
>  </property>
>
> Everything runs fine. (means all daemons are started perfectly) But
> when you try to submit the job. Job is stuck and NM logs says trying
> to connect to 'localhost:0'. Localization takes forever. Why?
>
> Please see the NM logs below.
>
> http://pastebin.com/QfQDZeqF
>
> Thanks,
> Prashant
>

Mime
View raw message