hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: MRv2 jobs fail when run with more than one slave
Date Tue, 17 Jul 2012 23:04:10 GMT
Trevor,

 It's hard for folks here to help you with CDH patchsets (it's their call on what they include),
can you pls try with vanilla Apache hadoop-2.0.0-alpha and I'll try helping out? 

thanks,
Arun

On Jul 17, 2012, at 2:24 PM, Trevor wrote:

> Hi all,
> 
> I recently upgraded from CDH4b2 (0.23.1) to CDH4 (2.0.0). Now for some strange reason,
my MRv2 jobs (TeraGen, specifically) fail if I run with more than one slave. For every slave
except the one running the Application Master, I get the following failed tasks and warnings
repeatedly:
> 
> 12/07/13 14:21:55 INFO mapreduce.Job: Running job: job_1342207265272_0001
> 12/07/13 14:22:17 INFO mapreduce.Job: Job job_1342207265272_0001 running in uber mode
: false
> 12/07/13 14:22:17 INFO mapreduce.Job:  map 0% reduce 0%
> 12/07/13 14:22:46 INFO mapreduce.Job:  map 1% reduce 0%
> 12/07/13 14:22:52 INFO mapreduce.Job:  map 2% reduce 0%
> 12/07/13 14:22:55 INFO mapreduce.Job:  map 3% reduce 0%
> 12/07/13 14:22:58 INFO mapreduce.Job:  map 4% reduce 0%
> 12/07/13 14:23:04 INFO mapreduce.Job:  map 5% reduce 0%
> 12/07/13 14:23:07 INFO mapreduce.Job:  map 6% reduce 0%
> 12/07/13 14:23:07 INFO mapreduce.Job: Task Id : attempt_1342207265272_0001_m_000004_0,
Status : FAILED
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server returned HTTP
response code: 400 for URL: http://
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000004_0&filter=stdout
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server returned HTTP
response code: 400 for URL: http://
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000004_0&filter=stderr
> 12/07/13 14:23:08 INFO mapreduce.Job: Task Id : attempt_1342207265272_0001_m_000003_0,
Status : FAILED
> 12/07/13 14:23:08 WARN mapreduce.Job: Error reading task output Server returned HTTP
response code: 400 for URL: http://
> perfgb0n0:8080/tasklog?plaintext=true&attemptid=attempt_1342207265272_0001_m_000003_0&filter=stdout
> ...
> 12/07/13 14:25:12 INFO mapreduce.Job:  map 25% reduce 0%
> 12/07/13 14:25:12 INFO mapreduce.Job: Job job_1342207265272_0001 failed with state FAILED
due to:
> ...
>                 Failed map tasks=19
>                 Launched map tasks=31
> 
> The HTTP 400 error appears to be generated by the ShuffleHandler, which is configured
to run on port 8080 of the slaves, and doesn't understand that URL. What I've been able to
piece together so far is that /tasklog is handled by the TaskLogServlet, which is part of
the TaskTracker. However, isn't this an MRv1 class that shouldn't even be running in my configuration?
Also, the TaskTracker appears to run on port 50060, so I don't know where port 8080 is coming
from.
> 
> Though it could be a red herring, this warning seems to be related to the job failing,
despite the fact that the job makes progress on the slave running the AM. The Node Manager
logs on both AM and non-AM slaves appear fairly similar, and I don't see any errors in the
non-AM logs.
> 
> Another strange data point: These failures occur running the slaves on ARM systems. Running
the slaves on x86 with the same configuration works. I'm using the same tarball on both, which
means that the native-hadoop library isn't loaded on ARM. The master/client is the same x86
system in both scenarios. All nodes are running Ubuntu 12.04.
> 
> Thanks for any guidance,
> Trevor
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Mime
View raw message