Return-Path: Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: (qmail 4001 invoked from network); 19 Sep 2010 03:20:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 Sep 2010 03:20:19 -0000 Received: (qmail 72133 invoked by uid 500); 19 Sep 2010 03:20:18 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 71711 invoked by uid 500); 19 Sep 2010 03:20:15 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 71689 invoked by uid 99); 19 Sep 2010 03:20:13 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Sep 2010 03:20:13 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Sep 2010 03:19:55 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o8J3JYX3007811 for ; Sun, 19 Sep 2010 03:19:34 GMT Message-ID: <19175025.279851284866374245.JavaMail.jira@thor> Date: Sat, 18 Sep 2010 23:19:34 -0400 (EDT) From: "mazhiyong (JIRA)" To: common-dev@hadoop.apache.org Subject: [jira] Created: (HADOOP-6958) org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache ------------------------------------------------------------------------------------------ Key: HADOOP-6958 URL: https://issues.apache.org/jira/browse/HADOOP-6958 Project: Hadoop Common Issue Type: Bug Affects Versions: 0.20.2 Environment: linux jdk1.6.0_20 hadoop 0.20.2 Reporter: mazhiyong Fix For: 0.20.2 hello, I am using hadoop-0.20.2 and hadoop semi-cluster run in a server and the datas only 800M . The problem is when the hadoop running a period of time (more than 1 hours),it not work. I am look up the log and find the exception: "INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201009161411_0368/attempt_201009161411_0368_m_000002_0/output/file.out in any of the configured local directories" I googled many blogs and web pages but I could neither understand why this happens nor found a solution to this. What does that error message mean and how can avoid it, any suggestions? I've confused the problem for a week already, Please sharing if you know what could be causing this, Thinks in advance! Configuration File: mapred.child.tmp /data/hadoop-tmp hadoop.tmp.dir /data/hadoop-tmp mapred.local.dir /data/hadoop-tmp fs.default.name hdfs://10.0.0.8:8020 mapred.job.tracker 10.0.0.8:8021 dfs.name.dir /data/name dfs.data.dir /data/data dfs.replication 1 ERROR Logs: INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201009161411_0368_r_000000_0 task's state:UNASSIGNED INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201009161411_0368_r_000000_0 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201009161411_0368_r_000000_0 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201009161411_0368_r_1871094354 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201009161411_0368_r_1871094354 spawned. INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201009161411_0368_r_1871094354 given task: attempt_201009161411_0368_r_000000_0 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 381650 bytes for reduce: 0 from map: attempt_201009161411_0368_m_000000_0 given 381650/381646 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.0.0.8:50060, dest: 10.0.0.8:58884, bytes: 381650, op: MAPRED_SHUFFLE, cliID: attempt_201009161411_0368_m_000000_0 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 384812 bytes for reduce: 0 from map: attempt_201009161411_0368_m_000001_0 given 384812/384808 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.0.0.8:50060, dest: 10.0.0.8:58884, bytes: 384812, op: MAPRED_SHUFFLE, cliID: attempt_201009161411_0368_m_000001_0 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009161411_0368_r_000000_0 0.16666667% reduce > copy (1 of 2 at 0.06 MB/s) > INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009161411_0368_r_000000_0 0.16666667% reduce > copy (1 of 2 at 0.06 MB/s) > INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009161411_0368_r_000000_0 0.16666667% reduce > copy (1 of 2 at 0.06 MB/s) > INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201009161411_0368_r_000000_0 is in commit-pending, task state:COMMIT_PENDING INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009161411_0368_r_000000_0 0.16666667% reduce > copy (1 of 2 at 0.06 MB/s) > INFO org.apache.hadoop.mapred.TaskTracker: Received commit task action for attempt_201009161411_0368_r_000000_0 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009161411_0368_r_000000_0 1.0% reduce > reduce INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201009161411_0368_r_000000_0 is done. INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201009161411_0368_r_000000_0 was 0 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201009161411_0368_r_1871094354 exited. Number of tasks it ran: 1 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201009161411_0368_m_000002_0 task's state:UNASSIGNED INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201009161411_0368_m_000002_0 INFO org.apache.hadoop.mapred.TaskTracker: Received KillTaskAction for task: attempt_201009161411_0368_r_000000_0 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201009161411_0368_m_000002_0 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201009161411_0368_r_000000_0 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201009161411_0368_r_000000_0 done; removing files. INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201009161411_0368_m_2026394863 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201009161411_0368_m_2026394863 spawned. INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201009161411_0368_m_2026394863 given task: attempt_201009161411_0368_m_000002_0 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009161411_0368_m_000002_0 0.0% INFO org.apache.hadoop.mapred.TaskTracker: attempt_201009161411_0368_m_000002_0 0.0% cleanup INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201009161411_0368_m_000002_0 is done. INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201009161411_0368_m_000002_0 was 0 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201009161411_0368_m_2026394863 exited. Number of tasks it ran: 1 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201009161411_0368/attempt_201009161411_0368_m_000002_0/output/file.out in any of the configured local directories INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201009161411_0368 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201009161411_0368_m_000000_0 done; removing files. INFO org.apache.hadoop.mapred.TaskRunner: attempt_201009161411_0368_m_000002_0 done; removing files. INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt_201009161411_0368_m_000002_0 not found in cache INFO org.apache.hadoop.mapred.TaskRunner: attempt_201009161411_0368_m_000001_0 done; removing files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.