hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin H├Ąger <martin.ha...@byburt.com>
Subject Reduce step never starts, can't read output from mappers? (Too many fetch-failures)
Date Wed, 24 Feb 2010 13:31:50 GMT
Hello,

I have set up a cluster with one NameNode/JobTracker and three
DataNode/TaskTrackers, and having some issues with the reduce step
being unable to start. Masters and slaves can ping and ssh each other.
Attaching conf files (same on all machines).

Log output for JobTracker and one of the TaskTrackers that seems suspicious:

JobTracker
=========

exjobb@exjobb-1:~$ hadoop jar /opt/hadoop/hadoop-0.20.1-examples.jar
wordcount input/sessions-20100205145800.txt output-wordcount
10/02/24 11:15:24 INFO input.FileInputFormat: Total input paths to process : 1
10/02/24 11:15:25 INFO mapred.JobClient: Running job: job_201002240852_0003
10/02/24 11:15:26 INFO mapred.JobClient:  map 0% reduce 0%
10/02/24 11:15:49 INFO mapred.JobClient:  map 1% reduce 0%
10/02/24 11:15:58 INFO mapred.JobClient:  map 2% reduce 0%
10/02/24 11:16:06 INFO mapred.JobClient:  map 3% reduce 0%
10/02/24 11:16:15 INFO mapred.JobClient:  map 4% reduce 0%
10/02/24 11:16:23 INFO mapred.JobClient:  map 5% reduce 0%
10/02/24 11:16:32 INFO mapred.JobClient:  map 6% reduce 0%
10/02/24 11:16:40 INFO mapred.JobClient:  map 7% reduce 0%
10/02/24 11:16:51 INFO mapred.JobClient:  map 8% reduce 0%
10/02/24 11:16:59 INFO mapred.JobClient:  map 9% reduce 0%
10/02/24 11:17:07 INFO mapred.JobClient:  map 10% reduce 0%
10/02/24 11:17:31 INFO mapred.JobClient:  map 11% reduce 0%
10/02/24 11:17:39 INFO mapred.JobClient:  map 12% reduce 0%
10/02/24 11:17:49 INFO mapred.JobClient:  map 13% reduce 0%
10/02/24 11:17:57 INFO mapred.JobClient:  map 14% reduce 0%
10/02/24 11:18:05 INFO mapred.JobClient:  map 15% reduce 0%
10/02/24 11:18:15 INFO mapred.JobClient:  map 16% reduce 0%
10/02/24 11:18:23 INFO mapred.JobClient:  map 17% reduce 0%
10/02/24 11:18:32 INFO mapred.JobClient:  map 18% reduce 0%
10/02/24 11:18:42 INFO mapred.JobClient:  map 19% reduce 0%
10/02/24 11:18:51 INFO mapred.JobClient:  map 20% reduce 0%
10/02/24 11:19:11 INFO mapred.JobClient:  map 21% reduce 0%
10/02/24 11:19:22 INFO mapred.JobClient:  map 22% reduce 0%
10/02/24 11:19:32 INFO mapred.JobClient:  map 23% reduce 0%
10/02/24 11:19:40 INFO mapred.JobClient:  map 24% reduce 0%
10/02/24 11:19:49 INFO mapred.JobClient:  map 25% reduce 0%
10/02/24 11:19:57 INFO mapred.JobClient:  map 26% reduce 0%
10/02/24 11:20:05 INFO mapred.JobClient:  map 27% reduce 0%
10/02/24 11:20:15 INFO mapred.JobClient:  map 28% reduce 0%
10/02/24 11:20:24 INFO mapred.JobClient:  map 29% reduce 0%
10/02/24 11:20:34 INFO mapred.JobClient:  map 30% reduce 0%
10/02/24 11:20:52 INFO mapred.JobClient:  map 31% reduce 0%
10/02/24 11:21:02 INFO mapred.JobClient:  map 32% reduce 0%
10/02/24 11:21:12 INFO mapred.JobClient:  map 33% reduce 0%
10/02/24 11:21:21 INFO mapred.JobClient:  map 34% reduce 0%
10/02/24 11:21:31 INFO mapred.JobClient:  map 35% reduce 0%
10/02/24 11:21:40 INFO mapred.JobClient:  map 36% reduce 0%
10/02/24 11:21:49 INFO mapred.JobClient:  map 37% reduce 0%
10/02/24 11:21:58 INFO mapred.JobClient:  map 38% reduce 0%
10/02/24 11:22:07 INFO mapred.JobClient:  map 39% reduce 0%
10/02/24 11:22:17 INFO mapred.JobClient:  map 40% reduce 0%
10/02/24 11:22:35 INFO mapred.JobClient:  map 41% reduce 0%
10/02/24 11:22:44 INFO mapred.JobClient:  map 42% reduce 0%
10/02/24 11:22:53 INFO mapred.JobClient:  map 43% reduce 0%
10/02/24 11:23:05 INFO mapred.JobClient:  map 44% reduce 0%
10/02/24 11:23:14 INFO mapred.JobClient:  map 45% reduce 0%
10/02/24 11:23:22 INFO mapred.JobClient:  map 46% reduce 0%
10/02/24 11:23:32 INFO mapred.JobClient:  map 47% reduce 0%
10/02/24 11:23:40 INFO mapred.JobClient:  map 48% reduce 0%
10/02/24 11:23:50 INFO mapred.JobClient:  map 49% reduce 0%
10/02/24 11:23:59 INFO mapred.JobClient:  map 50% reduce 0%
10/02/24 11:24:17 INFO mapred.JobClient:  map 51% reduce 0%
10/02/24 11:24:27 INFO mapred.JobClient:  map 52% reduce 0%
10/02/24 11:24:34 INFO mapred.JobClient:  map 53% reduce 0%
10/02/24 11:24:45 INFO mapred.JobClient:  map 54% reduce 0%
10/02/24 11:24:57 INFO mapred.JobClient:  map 55% reduce 0%
10/02/24 11:25:04 INFO mapred.JobClient:  map 56% reduce 0%
10/02/24 11:25:15 INFO mapred.JobClient:  map 57% reduce 0%
10/02/24 11:25:22 INFO mapred.JobClient:  map 58% reduce 0%
10/02/24 11:25:32 INFO mapred.JobClient:  map 59% reduce 0%
10/02/24 11:25:42 INFO mapred.JobClient:  map 60% reduce 0%
10/02/24 11:25:57 INFO mapred.JobClient:  map 61% reduce 0%
10/02/24 11:26:07 INFO mapred.JobClient:  map 62% reduce 0%
10/02/24 11:26:16 INFO mapred.JobClient:  map 63% reduce 0%
10/02/24 11:26:24 INFO mapred.JobClient:  map 64% reduce 0%
10/02/24 11:26:34 INFO mapred.JobClient:  map 65% reduce 0%
10/02/24 11:26:45 INFO mapred.JobClient:  map 66% reduce 0%
10/02/24 11:26:56 INFO mapred.JobClient:  map 67% reduce 0%
10/02/24 11:27:05 INFO mapred.JobClient:  map 68% reduce 0%
10/02/24 11:27:13 INFO mapred.JobClient:  map 69% reduce 0%
10/02/24 11:27:17 INFO mapred.JobClient: Task Id :
attempt_201002240852_0003_m_000000_0, Status : FAILED
Too many fetch-failures
10/02/24 11:27:18 WARN mapred.JobClient: Error reading task
outputhttp://localhost:50060/tasklog?plaintext=true&taskid=attempt_201002240852_0003_m_000000_0&filter=stdout
10/02/24 11:27:18 WARN mapred.JobClient: Error reading task
outputhttp://localhost:50060/tasklog?plaintext=true&taskid=attempt_201002240852_0003_m_000000_0&filter=stderr
10/02/24 11:27:24 INFO mapred.JobClient:  map 70% reduce 0%
10/02/24 11:27:25 INFO mapred.JobClient:  map 68% reduce 0%
10/02/24 11:27:28 INFO mapred.JobClient:  map 69% reduce 0%
10/02/24 11:27:43 INFO mapred.JobClient:  map 70% reduce 0%
10/02/24 11:27:52 INFO mapred.JobClient:  map 71% reduce 0%
10/02/24 11:28:01 INFO mapred.JobClient:  map 72% reduce 0%
10/02/24 11:28:09 INFO mapred.JobClient:  map 73% reduce 0%
10/02/24 11:28:18 INFO mapred.JobClient:  map 74% reduce 0%
10/02/24 11:28:28 INFO mapred.JobClient:  map 75% reduce 0%
10/02/24 11:28:40 INFO mapred.JobClient:  map 76% reduce 0%
10/02/24 11:28:49 INFO mapred.JobClient:  map 77% reduce 0%
10/02/24 11:28:58 INFO mapred.JobClient:  map 78% reduce 0%
10/02/24 11:29:11 INFO mapred.JobClient:  map 79% reduce 0%
10/02/24 11:29:24 INFO mapred.JobClient:  map 80% reduce 0%
10/02/24 11:29:33 INFO mapred.JobClient:  map 81% reduce 0%
10/02/24 11:29:42 INFO mapred.JobClient:  map 82% reduce 0%
10/02/24 11:29:51 INFO mapred.JobClient:  map 83% reduce 0%
10/02/24 11:30:00 INFO mapred.JobClient:  map 84% reduce 0%
10/02/24 11:30:10 INFO mapred.JobClient:  map 85% reduce 0%
10/02/24 11:30:18 INFO mapred.JobClient:  map 86% reduce 0%
10/02/24 11:30:30 INFO mapred.JobClient:  map 87% reduce 0%
10/02/24 11:30:39 INFO mapred.JobClient:  map 88% reduce 0%
10/02/24 11:30:52 INFO mapred.JobClient:  map 89% reduce 0%
10/02/24 11:31:06 INFO mapred.JobClient:  map 90% reduce 0%
10/02/24 11:31:15 INFO mapred.JobClient:  map 91% reduce 0%
10/02/24 11:31:23 INFO mapred.JobClient:  map 92% reduce 0%
10/02/24 11:31:33 INFO mapred.JobClient:  map 93% reduce 0%
10/02/24 11:31:42 INFO mapred.JobClient:  map 94% reduce 0%
10/02/24 11:31:50 INFO mapred.JobClient:  map 95% reduce 0%
10/02/24 11:31:59 INFO mapred.JobClient:  map 96% reduce 0%
10/02/24 11:32:08 INFO mapred.JobClient:  map 97% reduce 0%
10/02/24 11:32:20 INFO mapred.JobClient:  map 98% reduce 0%
10/02/24 11:32:34 INFO mapred.JobClient:  map 99% reduce 0%
10/02/24 11:33:01 INFO mapred.JobClient:  map 100% reduce 0%
10/02/24 11:39:37 INFO mapred.JobClient: Task Id :
attempt_201002240852_0003_m_000001_0, Status : FAILED
Too many fetch-failures
10/02/24 11:39:37 WARN mapred.JobClient: Error reading task
outputhttp://localhost:50060/tasklog?plaintext=true&taskid=attempt_201002240852_0003_m_000001_0&filter=stdout
10/02/24 11:39:37 WARN mapred.JobClient: Error reading task
outputhttp://localhost:50060/tasklog?plaintext=true&taskid=attempt_201002240852_0003_m_000001_0&filter=stderr
10/02/24 11:39:41 INFO mapred.JobClient:  map 98% reduce 0%
10/02/24 11:39:53 INFO mapred.JobClient:  map 99% reduce 0%
10/02/24 11:40:29 INFO mapred.JobClient:  map 100% reduce 0%

TaskTracker
===========

.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201002240852_0003/attempt_201002240852_0003_m_000001_0/output/file.out.index
in any of the configured local directories
2010-02-24 11:39:34,399 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_r_000000_0 0.0% reduce > copy >
2010-02-24 11:39:36,831 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_201002240852_0003_m_000001_1
task's state:UNASSIGNED
2010-02-24 11:39:36,831 INFO org.apache.hadoop.mapred.TaskTracker:
Trying to launch : attempt_201002240852_0003_m_000001_1
2010-02-24 11:39:36,831 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 2 and trying to launch
attempt_201002240852_0003_m_000001_1
2010-02-24 11:39:36,909 INFO org.apache.hadoop.mapred.JvmManager: In
JvmRunner constructed JVM ID: jvm_201002240852_0003_m_1482385716
2010-02-24 11:39:36,909 INFO org.apache.hadoop.mapred.JvmManager: JVM
Runner jvm_201002240852_0003_m_1482385716 spawned.
2010-02-24 11:39:37,408 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_r_000000_0 0.0% reduce > copy >
2010-02-24 11:39:37,908 INFO org.apache.hadoop.mapred.TaskTracker: JVM
with ID: jvm_201002240852_0003_m_1482385716 given task:
attempt_201002240852_0003_m_000001_1
2010-02-24 11:39:43,462 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_r_000000_0 0.0% reduce > copy >
2010-02-24 11:39:44,361 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_m_000001_1 0.12414845%
2010-02-24 11:39:47,480 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_m_000001_1 0.1880129%
2010-02-24 11:39:49,477 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_r_000000_0 0.0% reduce > copy >
2010-02-24 11:39:50,482 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_m_000001_1 0.2625587%
2010-02-24 11:39:52,511 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_r_000000_0 0.0% reduce > copy >
2010-02-24 11:39:53,517 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_m_000001_1 0.33351037%
2010-02-24 11:39:56,540 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_m_000001_1 0.39943638%
2010-02-24 11:39:58,545 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_r_000000_0 0.0% reduce > copy >
2010-02-24 11:39:59,545 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_m_000001_1 0.47682634%
2010-02-24 11:40:02,558 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_m_000001_1 0.5435813%
2010-02-24 11:40:04,614 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_r_000000_0 0.0% reduce > copy >
2010-02-24 11:40:05,584 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_m_000001_1 0.6098209%

(snip)

2010-02-24 11:41:53,069 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_r_000000_0 0.0% reduce > copy >
2010-02-24 11:41:59,089 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_r_000000_0 0.0% reduce > copy >
2010-02-24 11:42:04,820 INFO org.apache.hadoop.mapred.TaskTracker:
Sent out 9261429 bytes for reduce: 0 from map:
attempt_201002240852_0003_m_000004_0 given 9261429/9261425
2010-02-24 11:42:04,821 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src:
127.0.0.1:50060, dest: 127.0.0.1:33145, bytes: 9261429, op:
MAPRED_SHUFFLE, cliID: attempt_201002240852_0003_m_000004_0
2010-02-24 11:42:05,112 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201002240852_0003_r_000000_0 0.0042735045% reduce > copy (1 of
78 at 0.01 MB/s) >
2010-02-24 11:42:05,490 INFO org.apache.hadoop.mapred.TaskTracker:
Sent out 9227741 bytes for reduce: 0 from map:
attempt_201002240852_0003_m_000005_0 given 9227741/9227737
2010-02-24 11:42:05,490 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src:
127.0.0.1:50060, dest: 127.0.0.1:33145, bytes: 9227741, op:
MAPRED_SHUFFLE, cliID: attempt_201002240852_0003_m_000005_0
2010-02-24 11:42:05,613 WARN org.apache.hadoop.mapred.TaskTracker:
getMapOutput(attempt_201002240852_0003_m_000006_0,0) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201002240852_0003/attempt_201002240852_0003_m_000006_0/output/file.out.index
in any of the configured local directories
	at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
	at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
	at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2886)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:324)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
	at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

Mime
View raw message