hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Brown <misterr...@gmail.com>
Subject Re: Never ending reduce jobs, error Error reading task outputConnection refused
Date Fri, 04 Nov 2011 15:59:34 GMT
Hi Robert,
Thanks for the reply. Version of hadoop is hadoop-0.20.203.0.

It is weird how this is only a problem when the amount of data goes up.

My setup might be to blame, this is all a learning process for me so I have 5 VMs running.
1 VM is the JobTracker/Namenode, the other 4 are data/task nodes. They can all ping each other
and ssh to each other ok.

Cheers

Russell
On 4 Nov 2011, at 15:39, Robert Evans wrote:

> I am not sure what is causing this, but yes they are related.  In hadoop the map output
is served to the reducers through jetty, which is an imbedded web server.  If the reducers
are not able to fetch the map outputs, then they assume that the mapper is bad and a new mapper
is relaunched to compute the map output.  From the errors it looks like the map output is
being deleted/not showing up for some of the mappers.  I am not really sure why that would
be happening.  What version of hadoop are you using.
> 
> --Bobby Evans
> 
> On 11/4/11 10:28 AM, "Russell Brown" <misterruss@gmail.com> wrote:
> 
> Hi,
> I have a cluster of 4 tasktracker/datanodes and 1 JobTracker/Namenode. I can run small
jobs on this cluster fine (like up to a few thousand keys) but more than that and I start
seeing errors like this:
> 
> 
> 11/11/04 08:16:08 INFO mapred.JobClient: Task Id : attempt_201111040342_0006_m_000005_0,
Status : FAILED
> Too many fetch-failures
> 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task outputConnection refused
> 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task outputConnection refused
> 11/11/04 08:16:13 INFO mapred.JobClient:  map 97% reduce 1%
> 11/11/04 08:16:25 INFO mapred.JobClient:  map 100% reduce 1%
> 11/11/04 08:17:20 INFO mapred.JobClient: Task Id : attempt_201111040342_0006_m_000010_0,
Status : FAILED
> Too many fetch-failures
> 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task outputConnection refused
> 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task outputConnection refused
> 11/11/04 08:17:24 INFO mapred.JobClient:  map 97% reduce 1%
> 11/11/04 08:17:36 INFO mapred.JobClient:  map 100% reduce 1%
> 11/11/04 08:19:20 INFO mapred.JobClient: Task Id : attempt_201111040342_0006_m_000011_0,
Status : FAILED
> Too many fetch-failures
> 
> 
> 
> I have no IDEA what this means. All my nodes can ssh to each other, pass wordlessly,
all the time.
> 
> On the individual data/task nodes the logs have errors like this:
> 
> 2011-11-04 08:24:42,514 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(attempt_201111040342_0006_m_000015_0,2)
failed :
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/vagrant/jobcache/job_201111040342_0006/attempt_201111040342_0006_m_000015_0/output/file.out.index
in any of the configured local directories
>         at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
>         at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
>         at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3543)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>         at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>         at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>         at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:816)
>         at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>         at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>         at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>         at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>         at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>         at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>         at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>         at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>         at org.mortbay.jetty.Server.handle(Server.java:326)
>         at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>         at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>         at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>         at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>         at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
>         at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> 
> 2011-11-04 08:24:42,514 WARN org.apache.hadoop.mapred.TaskTracker: Unknown child with
bad map output: attempt_201111040342_0006_m_000015_0. Ignored.
> 
> 
> Are they related? What d any of the mean?
> 
> If I use a much smaller amount of data I don't see any of these errors and everything
works fine, so I guess they are to do with some resource (though what I don't know?) Looking
at MASTERNODE:50070/dfsnodelist.jsp?whatNodes=LIVE
> 
> I see that datanodes have ample disk space, that isn't it…
> 
> Any help at all really appreciated. Searching for the errors on Google has me nothing,
reading the Hadoop definitive guide as me nothing.
> 
> Many thanks in advance
> 
> Russell 
> 


Mime
View raw message