hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Wilcox (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-6958) org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache
Date Tue, 01 Nov 2011 15:05:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141221#comment-13141221
] 

Tom Wilcox commented on HADOOP-6958:
------------------------------------

The answer to this problem (for me at least) was that the contents of the /etc/hosts file..

After reading this:

http://www.mail-archive.com/core-user@hadoop.apache.org/msg03635.html

I did this:

Added lines to my /etc/hosts file so it looked like this:


127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

10.0.0.235      namenode secondarynamenode
10.0.0.236      jobtracker
10.0.0.237      slave0
10.0.0.238      slave1
10.0.0.239      slave2

Then on each machine I overwrote the /etc/hosts file with that above and then executed the
following:

sudo hostname namenode (on 10.0.0.235)
OR 
sudo hostname jobtracker (on 10.0.0.236)
OR 
sudo hostname slave0 (on 10.0.0.237)
OR 
sudo hostname slave1 (on 10.0.0.238)
OR 
sudo hostname slave2 (on 10.0.0.239)

Then I rebooted all machines in the cluster, disabled the firewalls on all and reran the job
successfully!
                
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6958
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6958
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>         Environment: linux
> jdk1.6.0_20
> hadoop 0.20.2
>            Reporter: mazhiyong
>             Fix For: 0.20.2
>
>
> hello,
>   I am using hadoop-0.20.2 and hadoop semi-cluster run in a server and the datas only
800M .
>   The problem is when the hadoop running a period of time (more than 1 hours),it not
work. I am look up the log and find the exception: "INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201009161411_0368/attempt_201009161411_0368_m_000002_0/output/file.out
in any of the configured local directories"
> 	I googled many blogs and web pages but I could neither understand why this happens nor
found a solution to this. What does that error message mean and how can avoid it, any suggestions?
> 	I've confused the problem for a week already, Please sharing if you know what could
be causing this, Thinks in advance!
> Configuration File:
> <!--hadoop-site.xml-->	
> 	<configuration>
>    <property>
>      <name>mapred.child.tmp</name>
>      <value>/data/hadoop-tmp</value>
>    </property>
>    <property>
>      <name>hadoop.tmp.dir</name>
>      <value>/data/hadoop-tmp</value>
>    </property>
>    <property>
>      <name>mapred.local.dir</name>
>      <value>/data/hadoop-tmp</value>
>    </property>
>  </configuration>
> 	
> <!--core-site.xml-->
> 	<configuration>
>    <property>
>     <name>fs.default.name</name>
>     <value>hdfs://10.0.0.8:8020</value>
>    </property>
>   </configuration>
>  
> <!--mapred-site.xml--> 
>   <configuration>
>    <property>
>      <name>mapred.job.tracker</name>
>      <value>10.0.0.8:8021</value>
>    </property>
>  </configuration>
>   
> <!--hdfs-site.xml-->
>   <configuration>
>    <property>
>     <name>dfs.name.dir</name>
>     <value>/data/name</value>
>    </property>
>    <property>
>     <name>dfs.data.dir</name>
>     <value>/data/data</value>  
>    </property>
>    <property>
>     <name>dfs.replication</name>
>     <value>1</value>
>    </property>
>   </configuration>
> ERROR Logs:
> INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201009161411_0368_r_000000_0
task's state:UNASSIGNED
> INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201009161411_0368_r_000000_0
> INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and
trying to launch attempt_201009161411_0368_r_000000_0
> INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201009161411_0368_r_1871094354
> INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201009161411_0368_r_1871094354
spawned.
> INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201009161411_0368_r_1871094354
given task: attempt_201009161411_0368_r_000000_0
> INFO org.apache.hadoop.mapred.TaskTracker: Sent out 381650 bytes for reduce: 0 from map:
attempt_201009161411_0368_m_000000_0 given 381650/381646
> INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.0.0.8:50060, dest: 10.0.0.8:58884,
bytes: 381650, op: MAPRED_SHUFFLE, cliID: attempt_201009161411_0368_m_000000_0
> INFO org.apache.hadoop.mapred.TaskTracker: Sent out 384812 bytes for reduce: 0 from map:
attempt_201009161411_0368_m_000001_0 given 384812/384808
> INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.0.0.8:50060, dest: 10.0.0.8:58884,
bytes: 384812, op: MAPRED_SHUFFLE, cliID: attempt_201009161411_0368_m_000001_0
> INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009161411_0368_r_000000_0 0.16666667%
reduce > copy (1 of 2 at 0.06 MB/s) > 
> INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009161411_0368_r_000000_0 0.16666667%
reduce > copy (1 of 2 at 0.06 MB/s) > 
> INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009161411_0368_r_000000_0 0.16666667%
reduce > copy (1 of 2 at 0.06 MB/s) > 
> INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201009161411_0368_r_000000_0
is in commit-pending, task state:COMMIT_PENDING
> INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009161411_0368_r_000000_0 0.16666667%
reduce > copy (1 of 2 at 0.06 MB/s) > 
> INFO org.apache.hadoop.mapred.TaskTracker: Received commit task action for attempt_201009161411_0368_r_000000_0
> INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009161411_0368_r_000000_0 1.0%
reduce > reduce
> INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201009161411_0368_r_000000_0
is done.
> INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201009161411_0368_r_000000_0
 was 0
> INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
> INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201009161411_0368_r_1871094354 exited.
Number of tasks it ran: 1
> INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201009161411_0368_m_000002_0
task's state:UNASSIGNED
> INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201009161411_0368_m_000002_0
> INFO org.apache.hadoop.mapred.TaskTracker: Received KillTaskAction for task: attempt_201009161411_0368_r_000000_0
> INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and
trying to launch attempt_201009161411_0368_m_000002_0
> INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201009161411_0368_r_000000_0
> INFO org.apache.hadoop.mapred.TaskRunner: attempt_201009161411_0368_r_000000_0 done;
removing files.
> INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201009161411_0368_m_2026394863
> INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201009161411_0368_m_2026394863
spawned.
> INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201009161411_0368_m_2026394863
given task: attempt_201009161411_0368_m_000002_0
> INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009161411_0368_m_000002_0 0.0%

> INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009161411_0368_m_000002_0 0.0%
cleanup
> INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201009161411_0368_m_000002_0
is done.
> INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201009161411_0368_m_000002_0
 was 0
> INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
> INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201009161411_0368_m_2026394863 exited.
Number of tasks it ran: 1
> INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException:
Could not find taskTracker/jobcache/job_201009161411_0368/attempt_201009161411_0368_m_000002_0/output/file.out
in any of the configured local directories
> INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201009161411_0368
> INFO org.apache.hadoop.mapred.TaskRunner: attempt_201009161411_0368_m_000000_0 done;
removing files.
> INFO org.apache.hadoop.mapred.TaskRunner: attempt_201009161411_0368_m_000002_0 done;
removing files.
> INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt_201009161411_0368_m_000002_0
not found in cache
> INFO org.apache.hadoop.mapred.TaskRunner: attempt_201009161411_0368_m_000001_0 done;
removing files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message