hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Whitecross <sc...@dataxu.com>
Subject Re: Frustrated with Cluster Setup: Reduce Tasks Stop at 16% - could not find taskTracker/jobcache...
Date Thu, 30 Oct 2008 18:02:02 GMT
So its not just at 16%, but depends on the task:
2008-10-30 13:58:29,702 INFO org.apache.hadoop.mapred.TaskTracker:  
attempt_200810301345_0001_r_000000_0 0.25675678% reduce > copy (57 of  
74 at 13.58 MB/s) >

2008-10-30 13:58:29,357 WARN org.apache.hadoop.mapred.TaskTracker:  
getMapOutput(attempt_200810301345_0001_m_000048_0,0) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find  
taskTracker/jobcache/job_200810301345_0001/ 
attempt_200810301345_0001_m_000048_0/output/file.out.index in any of  
the configured local directories
	at org.apache.hadoop.fs.LocalDirAllocator 
$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359)
	at  
org 
.apache 
.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java: 
138)
	at org.apache.hadoop.mapred.TaskTracker 
$MapOutputServlet.doGet(TaskTracker.java:2402)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

I'm out of thoughts on what the problem could be..


On Oct 30, 2008, at 12:35 PM, Scott Whitecross wrote:

> I'm growing very frustrated with a simple cluster setup.  I can get  
> the cluster setup on two machines, but have troubles when trying to  
> extend the installation to 3 or more boxes.  I keep seeing the below  
> errors.  It seems the reduce tasks can't get access to the data.
>
> I can't seem to figure out how to fix this error.   What amazes me  
> is that file not found issues appear on the master box, as well as  
> the slaves.  What causes the reduce tasks to not read find  
> information via the localhost?
>
> Setup/Errors:
>
> My basic setup comes from: http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)

>  (Michael Noll's setup).  I've put the following in the my /etc/ 
> hosts file:
>
> 127.0.0.1       localhost
> 10.1.1.12       master
> 10.1.1.10       slave
> 10.1.1.13       slave1
>
> And have setup transparent ssh to all boxes (and it works).  All  
> boxes can see each other, etc.
>
> My base level hadoop-site.xml is:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> <!-- Put site-specific property overrides in this file. -->
> <configuration>
>        <property>
>                <name>hadoop.tmp.dir</name>
>                <value>/opt/hadoop-datastore</value>
>        </property>
>        <property>
>                <name>fs.default.name</name>
>                <value>hdfs://master:54310</value>
>        </property>
>        <property>
>                <name>mapred.job.tracker</name>
>                <value>master:54311</value>
>        </property>
>        <property>
>                <name>dfs.replication</name>
>                <value>3</value>
>        </property>
> </configuration>
>
>
> Errors:
>
> WARN org.apache.hadoop.mapred.TaskTracker:  
> getMapOutput(attempt_200810301206_0004_m_000001_0,0) failed :
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not  
> find taskTracker/jobcache/job_200810301206_0004/ 
> attempt_200810301206_0004_m_000001_0/output/file.out.index in any of  
> the configured local directories
> 	at org.apache.hadoop.fs.LocalDirAllocator 
> $AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359)
> 	at  
> org 
> .apache 
> .hadoop 
> .fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java: 
> 138)...
>
> and in the userlog of the attempt:
>
> 2008-10-30 12:28:00,806 WARN org.apache.hadoop.mapred.ReduceTask:  
> java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_200810301206_0004&map=attempt_200810301206_0004_m_000001_0&reduce=0
> 	at sun.reflect.GeneratedConstructorAccessor3.newInstance(Unknown  
> Source)
> 	at  
> sun 
> .reflect 
> .DelegatingConstructorAccessorImpl 
> .newInstance(DelegatingConstructorAccessorImpl.java:27)
>


Mime
View raw message