hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Brown <misterr...@gmail.com>
Subject Re: Never ending reduce jobs, error Error reading task outputConnection refused
Date Fri, 04 Nov 2011 16:09:07 GMT
Done so, working, Awesome and many many thanks!

Cheers

Russell
On 4 Nov 2011, at 16:06, Uma Maheswara Rao G 72686 wrote:

> ----- Original Message -----
> From: Russell Brown <misterruss@gmail.com>
> Date: Friday, November 4, 2011 9:18 pm
> Subject: Re: Never ending reduce jobs, error Error reading task outputConnection refused
> To: mapreduce-user@hadoop.apache.org
> 
>> 
>> On 4 Nov 2011, at 15:44, Uma Maheswara Rao G 72686 wrote:
>> 
>>> ----- Original Message -----
>>> From: Russell Brown <misterruss@gmail.com>
>>> Date: Friday, November 4, 2011 9:11 pm
>>> Subject: Re: Never ending reduce jobs, error Error reading task 
>> outputConnection refused
>>> To: mapreduce-user@hadoop.apache.org
>>> 
>>>> 
>>>> On 4 Nov 2011, at 15:35, Uma Maheswara Rao G 72686 wrote:
>>>> 
>>>>> This problem may come if you dont configure the hostmappings 
>>>> properly.> Can you check whether your tasktrackers are pingable 
>>>> from each other with the configured hosts names?
>>>> 
>>>> 
>>>> Hi,
>>>> Thanks for replying so fast!
>>>> 
>>>> Hostnames? I use IP addresses in the slaves config file, and 
>> via 
>>>> IP addresses everyone can ping everyone else, do I need to set 
>> up 
>>>> hostnames too?
>>> Yes, can you configure hostname mappings and check..
>> 
>> Like full blown DNS? I mean there is no reference to any machine 
>> by hostname in any of my config anywhere, so I'm not sure where to 
>> start. These machines are just on my local network.
> you need to configure them in /etc/hosts file.
> ex: xx.xx.xx.xx1 TT_HOSTNAME1
>    xx.xx.xx.xx2 TT_HOSTNAME2
>    xx.xx.xx.xx3 TT_HOSTNAME3
>    xx.xx.xx.xx4 TT_HOSTNAME4
> configure them in all the machines and check.
>> 
>>>> 
>>>> Cheers
>>>> 
>>>> Russell
>>>>> 
>>>>> Regards,
>>>>> Uma
>>>>> ----- Original Message -----
>>>>> From: Russell Brown <misterruss@gmail.com>
>>>>> Date: Friday, November 4, 2011 9:00 pm
>>>>> Subject: Never ending reduce jobs, error Error reading task 
>>>> outputConnection refused
>>>>> To: mapreduce-user@hadoop.apache.org
>>>>> 
>>>>>> Hi,
>>>>>> I have a cluster of 4 tasktracker/datanodes and 1 
>>>>>> JobTracker/Namenode. I can run small jobs on this cluster 
>> fine 
>>>>>> (like up to a few thousand keys) but more than that and I 
>> start 
>>>>>> seeing errors like this:
>>>>>> 
>>>>>> 
>>>>>> 11/11/04 08:16:08 INFO mapred.JobClient: Task Id : 
>>>>>> attempt_201111040342_0006_m_000005_0, Status : FAILED
>>>>>> Too many fetch-failures
>>>>>> 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task 
>>>>>> outputConnection refused
>>>>>> 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task 
>>>>>> outputConnection refused
>>>>>> 11/11/04 08:16:13 INFO mapred.JobClient:  map 97% reduce 1%
>>>>>> 11/11/04 08:16:25 INFO mapred.JobClient:  map 100% reduce 1%
>>>>>> 11/11/04 08:17:20 INFO mapred.JobClient: Task Id : 
>>>>>> attempt_201111040342_0006_m_000010_0, Status : FAILED
>>>>>> Too many fetch-failures
>>>>>> 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task 
>>>>>> outputConnection refused
>>>>>> 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task 
>>>>>> outputConnection refused
>>>>>> 11/11/04 08:17:24 INFO mapred.JobClient:  map 97% reduce 1%
>>>>>> 11/11/04 08:17:36 INFO mapred.JobClient:  map 100% reduce 1%
>>>>>> 11/11/04 08:19:20 INFO mapred.JobClient: Task Id : 
>>>>>> attempt_201111040342_0006_m_000011_0, Status : FAILED
>>>>>> Too many fetch-failures
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I have no IDEA what this means. All my nodes can ssh to each 
>>>>>> other, pass wordlessly, all the time.
>>>>>> 
>>>>>> On the individual data/task nodes the logs have errors like this:
>>>>>> 
>>>>>> 2011-11-04 08:24:42,514 WARN 
>>>> org.apache.hadoop.mapred.TaskTracker: 
>>>>>> getMapOutput(attempt_201111040342_0006_m_000015_0,2) failed :
>>>>>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could 
>>>> not 
>>>>>> find 
>>>> 
>> taskTracker/vagrant/jobcache/job_201111040342_0006/attempt_201111040342_0006_m_000015_0/output/file.out.index
in any of the configured local directories
>>>>>> 	at 
>>>>>> 
>>>> 
>> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
>>>>>> 	at 
>>>>>> 
>>>> 
>> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3543)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>>>>>> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>>>>>> 	at 
>>>>>> 
>>>> 
>> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>>>>>> 	at 
>>>>>> 
>>>> 
>> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:816)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>>>>>> 	at 
>>>>>> 
>>>> 
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>>>>> 	at 
>>>>>> 
>>>> 
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>>>>>> 	at 
>>>>>> 
>>>> 
>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>>>>>> 	at 
>>>>>> 
>>>> 
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)	at org.mortbay.jetty.Server.handle(Server.java:326)
>>>>>> 	at 
>>>>>> 
>>>> 
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>>>>>> 	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>>>>>> 	at 
>>>> 
>> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)>> 
>>>> 	at 
>> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)>>>>
	at 
>>>>>> 
>>>> 
>> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>>>>>> 
>>>>>> 2011-11-04 08:24:42,514 WARN 
>>>> org.apache.hadoop.mapred.TaskTracker: 
>>>>>> Unknown child with bad map output: 
>>>>>> attempt_201111040342_0006_m_000015_0. Ignored.
>>>>>> 
>>>>>> 
>>>>>> Are they related? What d any of the mean?
>>>>>> 
>>>>>> If I use a much smaller amount of data I don't see any of 
>> these 
>>>>>> errors and everything works fine, so I guess they are to do 
>>>> with 
>>>>>> some resource (though what I don't know?) Looking at 
>>>>>> MASTERNODE:50070/dfsnodelist.jsp?whatNodes=LIVE
>>>>>> I see that datanodes have ample disk space, that isn't it…
>>>>>> 
>>>>>> Any help at all really appreciated. Searching for the errors 
>> on 
>>>>>> Google has me nothing, reading the Hadoop definitive guide as 
>>>> me 
>>>>>> nothing.
>>>>>> Many thanks in advance
>>>>>> 
>>>>>> Russell
>>>> 
>>>> 
>>> Regards,
>>> Uma
>> 
>> 


Mime
View raw message