hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chandravadana (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4148) DiskChecker$DiskErrorException
Date Thu, 11 Sep 2008 12:23:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630172#action_12630172
] 

chandravadana commented on HADOOP-4148:
---------------------------------------


Job tracker's log:

2008-09-11 13:09:09,663 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task tip_200809111304_0002_m_000000
2008-09-11 13:09:09,664 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'task_200809111304_0002_m_000000_0' to tip
tip_200809111304_0002_m_000000, for tracker
'tracker_master:localhost.localdomain/127.0.0.1:38315'
2008-09-11 13:09:09,949 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task tip_200809111304_0002_m_000001
2008-09-11 13:09:09,963 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'task_200809111304_0002_m_000001_0' to tip
tip_200809111304_0002_m_000001, for tracker
'tracker_master:localhost.localdomain/127.0.0.1:38315'
2008-09-11 13:09:11,998 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'task_200809111304_0002_m_000000_0' has completed
tip_200809111304_0002_m_000000 successfully.
2008-09-11 13:09:12,001 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task tip_200809111304_0002_m_000002
2008-09-11 13:09:12,002 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'task_200809111304_0002_m_000002_0' to tip
tip_200809111304_0002_m_000002, for tracker
'tracker_master:localhost.localdomain/127.0.0.1:38315'
2008-09-11 13:09:12,807 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'task_200809111304_0002_m_000001_0' has completed
tip_200809111304_0002_m_000001 successfully.
2008-09-11 13:09:12,809 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task tip_200809111304_0002_m_000003
2008-09-11 13:09:12,810 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'task_200809111304_0002_m_000003_0' to tip
tip_200809111304_0002_m_000003, for tracker
'tracker_master:localhost.localdomain/127.0.0.1:38315'
2008-09-11 13:09:13,168 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task tip_200809111304_0002_m_000004
2008-09-11 13:09:13,169 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'task_200809111304_0002_m_000004_0' to tip
tip_200809111304_0002_m_000004, for tracker
'tracker_localhost:localhost/127.0.0.1:38957'
2008-09-11 13:09:13,849 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'task_200809111304_0002_m_000002_0' has completed
tip_200809111304_0002_m_000002 successfully.
2008-09-11 13:09:13,852 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'task_200809111304_0002_r_000000_0' to tip
tip_200809111304_0002_r_000000, for tracker
'tracker_master:localhost.localdomain/127.0.0.1:38315'
2008-09-11 13:09:14,696 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'task_200809111304_0002_m_000003_0' has completed
tip_200809111304_0002_m_000003 successfully.
2008-09-11 13:09:14,996 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'task_200809111304_0002_m_000004_0' has completed
tip_200809111304_0002_m_000004 successfully.
2008-09-11 13:11:59,832 INFO org.apache.hadoop.mapred.JobInProgress:
Failed fetch notification #1 for task task_200809111304_0002_m_000004_0
2008-09-11 13:16:59,385 INFO org.apache.hadoop.mapred.JobInProgress:
Failed fetch notification #2 for task task_200809111304_0002_m_000004_0
2008-09-11 13:22:04,659 INFO org.apache.hadoop.mapred.JobInProgress:
Failed fetch notification #3 for task task_200809111304_0002_m_000004_0
2008-09-11 13:22:04,659 INFO org.apache.hadoop.mapred.JobInProgress: Too
many fetch-failures for output of task:
task_200809111304_0002_m_000004_0 ... killing it
2008-09-11 13:22:04,659 INFO org.apache.hadoop.mapred.TaskInProgress:
Error from task_200809111304_0002_m_000004_0: Too many fetch-failures
2008-09-11 13:22:04,660 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task tip_200809111304_0002_m_000004
2008-09-11 13:22:04,660 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'task_200809111304_0002_m_000004_1' to tip
tip_200809111304_0002_m_000004, for tracker
'tracker_master:localhost.localdomain/127.0.0.1:38315'
2008-09-11 13:22:05,259 INFO org.apache.hadoop.mapred.JobTracker:
Removed completed task 'task_200809111304_0002_m_000004_0' from
'tracker_localhost:localhost/127.0.0.1:38957'
2008-09-11 13:22:06,496 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'task_200809111304_0002_m_000004_1' has completed
tip_200809111304_0002_m_000004 successfully.
2008-09-11 13:22:11,228 INFO org.apache.hadoop.mapred.TaskRunner: Saved
output of task 'task_200809111304_0002_r_000000_0' to
hdfs://master:54310/user/root/b2
2008-09-11 13:22:11,228 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'task_200809111304_0002_r_000000_0' has completed
tip_200809111304_0002_r_000000 successfully.
2008-09-11 13:22:11,241 INFO org.apache.hadoop.mapred.JobInProgress: Job
job_200809111304_0002 has completed successfully.
2008-09-11 13:22:14,466 INFO org.apache.hadoop.mapred.JobTracker:
Removed completed task 'task_200809111304_0002_m_000004_0' from
'tracker_localhost:localhost/127.0.0.1:38957'
2008-09-11 13:22:15,294 INFO org.apache.hadoop.mapred.JobTracker:
Removed completed task 'task_200809111304_0002_m_000000_0' from
'tracker_master:localhost.localdomain/127.0.0.1:38315'
2008-09-11 13:22:15,294 INFO org.apache.hadoop.mapred.JobTracker:
Removed completed task 'task_200809111304_0002_m_000001_0' from
'tracker_master:localhost.localdomain/127.0.0.1:38315'
2008-09-11 13:22:15,294 INFO org.apache.hadoop.mapred.JobTracker:
Removed completed task 'task_200809111304_0002_m_000002_0' from
'tracker_master:localhost.localdomain/127.0.0.1:38315'
2008-09-11 13:22:15,294 INFO org.apache.hadoop.mapred.JobTracker:
Removed completed task 'task_200809111304_0002_m_000003_0' from
'tracker_master:localhost.localdomain/127.0.0.1:38315'
2008-09-11 13:22:15,295 INFO org.apache.hadoop.mapred.JobTracker:
Removed completed task 'task_200809111304_0002_m_000004_1' from
'tracker_master:localhost.localdomain/127.0.0.1:38315'
2008-09-11 13:22:15,295 INFO org.apache.hadoop.mapred.JobTracker:
Removed completed task 'task_200809111304_0002_r_000000_0' from
'tracker_master:localhost.localdomain/127.0.0.1:38315'
2008-09-11 14:25:56,282 INFO org.apache.hadoop.mapred.JobTracker:
Serious problem.  While updating status, cannot find taskid
task_200809111304_0002_m_000004_0

finally it got completed in 43min, 58 sec for just 5 inputs

task tracker's log

2008-09-11 13:09:14,067 INFO org.apache.hadoop.mapred.TaskTracker:
task_200809111304_0002_m_000003_0 1.0%
hdfs://master:54310/user/root/a1/f15:0+10348
2008-09-11 13:09:14,077 INFO org.apache.hadoop.mapred.TaskTracker: Task
task_200809111304_0002_m_000003_0 is done.
2008-09-11 13:09:20,647 INFO org.apache.hadoop.mapred.TaskTracker:
task_200809111304_0002_r_000000_0 0.20000002% reduce > copy (3 of 5 at
0.01 MB/s) > 
2008-09-11 13:09:22,686 WARN org.apache.hadoop.mapred.TaskTracker:
getMapOutput(task_200809111304_0002_m_000004_0,0) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200809111304_0002/task_200809111304_0002_m_000004_0/output/file.out.index
in any of the configured local directories
	at org.apache.hadoop.fs.LocalDirAllocator
$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359)
	at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead
(LocalDirAllocator.java:138)
	at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet
(TaskTracker.java:2315)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
	at org.mortbay.jetty.servlet.ServletHolder.handle
(ServletHolder.java:427)
	at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch
(WebApplicationHandler.java:475)
	at org.mortbay.jetty.servlet.ServletHandler.handle
(ServletHandler.java:567)
	at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
	at org.mortbay.jetty.servlet.WebApplicationContext.handle
(WebApplicationContext.java:635)
	at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
	at org.mortbay.http.HttpServer.service(HttpServer.java:954)
	at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
	at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
	at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
	at org.mortbay.http.SocketListener.handleConnection
(SocketListener.java:244)
	at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
	at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

2008-09-11 13:09:22,686 WARN org.apache.hadoop.mapred.TaskTracker:
Unknown child with bad map output: task_200809111304_0002_m_000004_0.
Ignored.
2008-09-11 13:09:23,652 INFO org.apache.hadoop.mapred.TaskTracker:
task_200809111304_0002_r_000000_0 0.26666668% reduce > copy (4 of 5 at
0.01 MB/s) > 
2008-09-11 13:09:26,656 INFO org.apache.hadoop.mapred.TaskTracker:
task_200809111304_0002_r_000000_0 0.26666668% reduce > copy (4 of 5 at
0.01 MB/s) > 

this was present so many times and finally

2008-09-11 13:22:04,694 WARN
org.apache.hadoop.conf.Configuration: /root/Desktop/hadoop/hadoop-0.17.2.1/tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200809111304_0002/job.xml:a
attempt to override final parameter: dfs.replication;  Ignoring.
2008-09-11 13:22:05,879 INFO org.apache.hadoop.mapred.TaskTracker:
task_200809111304_0002_m_000004_1 1.0%
hdfs://master:54310/user/root/a1/f10:0+2309
2008-09-11 13:22:05,888 INFO org.apache.hadoop.mapred.TaskTracker: Task
task_200809111304_0002_m_000004_1 is done.
2008-09-11 13:22:09,445 INFO org.apache.hadoop.mapred.TaskTracker:
task_200809111304_0002_r_000000_0 0.26666668% reduce > copy (4 of 5 at
0.01 MB/s) > 
2008-09-11 13:22:10,522 INFO org.apache.hadoop.mapred.TaskTracker:
task_200809111304_0002_r_000000_0 0.8744081% reduce > reduce
2008-09-11 13:22:10,526 INFO org.apache.hadoop.mapred.TaskTracker: Task
task_200809111304_0002_r_000000_0 is done.
2008-09-11 13:22:15,296 INFO org.apache.hadoop.mapred.TaskTracker:
Received 'KillJobAction' for job: job_200809111304_0002
2008-09-11 13:22:15,296 INFO org.apache.hadoop.mapred.TaskRunner:
task_200809111304_0002_m_000003_0 done; removing files.
2008-09-11 13:22:15,299 INFO org.apache.hadoop.mapred.TaskRunner:
task_200809111304_0002_r_000000_0 done; removing files.
2008-09-11 13:22:15,301 INFO org.apache.hadoop.mapred.TaskRunner:
task_200809111304_0002_m_000002_0 done; removing files.
2008-09-11 13:22:15,304 INFO org.apache.hadoop.mapred.TaskRunner:
task_200809111304_0002_m_000004_1 done; removing files.
2008-09-11 13:22:15,307 INFO org.apache.hadoop.mapred.TaskRunner:
task_200809111304_0002_m_000001_0 done; removing files.
2008-09-11 13:22:15,327 INFO org.apache.hadoop.mapred.TaskRunner:
task_200809111304_0002_m_000000_0 done; removing files.
2008-09-11 15:01:47,975 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at master/10.232.25.197



This e-mail and any files transmitted with it are for the sole use of the intended recipient(s)
and may contain confidential and privileged information.
If you are not the intended recipient, please contact the sender by reply e-mail and destroy
all copies of the original message. 
Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of
this email or any action taken in reliance on this e-mail is strictly 
prohibited and may be unlawful.


> DiskChecker$DiskErrorException
> ------------------------------
>
>                 Key: HADOOP-4148
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4148
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.2
>         Environment: 2 systems
> 1- redhat 
> 1- ubuntu
>            Reporter: chandravadana
>            Priority: Blocker
>             Fix For: 0.17.2
>
>
> hi
> 1- redhat - master( jobtracker + namenode+ tasktracker + datanode)
> 1- ubuntu - slave ( tasktracker + datanode)
> when i execute
>  bin/hadoop jar word/word.jar org.myorg.WordCount in mn2
> 08/09/10 15:12:56 INFO mapred.FileInputFormat: Total input paths to process : 5
> 08/09/10 15:12:56 INFO mapred.JobClient: Running job: job_200809101511_0003
> 08/09/10 15:12:57 INFO mapred.JobClient:  map 0% reduce 0%
> 08/09/10 15:13:00 INFO mapred.JobClient:  map 20% reduce 0%
> 08/09/10 15:13:01 INFO mapred.JobClient:  map 80% reduce 0%
> 08/09/10 15:13:02 INFO mapred.JobClient:  map 100% reduce 0%
> 08/09/10 15:13:11 INFO mapred.JobClient:  map 100% reduce 13%
> 08/09/10 15:30:41 INFO mapred.JobClient:  map 80% reduce 13%
> 08/09/10 15:30:41 INFO mapred.JobClient: Task Id : task_200809101511_0003_m_000000_0,
Status : FAILED
> Too many fetch-failures
> 08/09/10 15:30:42 WARN mapred.JobClient: Error reading task outputhttp://localhost:50060/tasklog?plaintext=true&taskid=task_200809101511_0003_m_000000_0&filter=stdout
> 08/09/10 15:30:42 WARN mapred.JobClient: Error reading task outputhttp://localhost:50060/tasklog?plaintext=true&taskid=task_200809101511_0003_m_000000_0&filter=stderr
> 08/09/10 15:30:44 INFO mapred.JobClient:  map 100% reduce 13%
> 08/09/10 15:30:49 INFO mapred.JobClient:  map 100% reduce 20%
> 08/09/10 15:40:52 INFO mapred.JobClient: Task Id : task_200809101511_0003_m_000004_0,
Status : FAILED
> Too many fetch-failures
> 08/09/10 15:40:52 WARN mapred.JobClient: Error reading task outputhttp://localhost:50060/tasklog?plaintext=true&taskid=task_200809101511_0003_m_000004_0&filter=stdout
> 08/09/10 15:40:52 WARN mapred.JobClient: Error reading task outputhttp://localhost:50060/tasklog?plaintext=true&taskid=task_200809101511_0003_m_000004_0&filter=stderr
> 08/09/10 15:41:03 INFO mapred.JobClient:  map 100% reduce 26%
> it halts
> when i saw the tasktracker's log, i found
>  getMapOutput(task_200809101511_0003_m_000004_0,0) failed :
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200809101511_0003/task_200809101511_0003_m_000004_0/output/file.out.index
in any of the configured local directories
> 	at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:359)
> 	at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
> 	at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2315)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
> 	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
> 	at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
> 	at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
> 	at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
> 	at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
> 	at org.mortbay.http.HttpServer.service(HttpServer.java:954)
> 	at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
> 	at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
> 	at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
> 	at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
> 	at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
> 	at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
> 2008-09-10 15:33:12,915 WARN org.apache.hadoop.mapred.TaskTracker: Unknown child with
bad map output: task_200809101511_0003_m_000004_0. Ignored.
> 2008-09-10 15:33:17,425 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:33:23,431 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:33:29,437 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:33:32,439 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:33:38,445 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:33:44,451 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:33:47,454 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:33:53,460 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:33:59,465 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:34:02,469 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:34:08,475 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:34:14,480 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:34:17,484 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:34:23,490 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:34:29,495 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> 2008-09-10 15:34:32,498 INFO org.apache.hadoop.mapred.TaskTracker: task_200809101511_0003_r_000000_0
0.20000002% reduce > copy (3 of 5 at 0.00 MB/s) > 
> reducer task runs on master(redhat)
> the task_200809101511_0003_m_000004_0/ specified in the log was done in slave(ubuntu)
> in jobtracker's log, i found
> 2008-09-10 15:35:46,977 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification
#2 for task task_200809101511_0003_m_000004_0
> hadoop-site.xml
> <configuration>
> <property>
>     <name>fs.default.name</name>
>     <value>hdfs://master:54310/</value>
> <final>true</final>
>   </property>
>   <property>
>     <name>mapred.job.tracker</name>
>     <value>master:54311</value>
> <final>true</final>
>   </property>
>   <property>
>     <name>dfs.replication</name>
>     <value>2</value>
> <final>true</final>
>   </property> 
> <property>
>   <name>hadoop.tmp.dir</name>
>   <value>absolute path</value>
>   <final>true</final>
> </property>
> <property>
>   <name>mapred.child.java.opts</name>
>   <value>-Xmx512M</value>
> <final>true</final>
> </property>
> <property>
> <name>mapred.speculative.execution</name>
> <value>false</value>
> <final>true</final>
> </property>
> </configuration>
> i dont know where i went wrong ..
> kindly help me solving this
> thanks 
> Chandravadana

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message