hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Björn-Elmar Macek <ma...@cs.uni-kassel.de>
Subject Re: DataNode and Tasttracker communication
Date Thu, 16 Aug 2012 13:17:49 GMT
Hello again,

well i have sorted out about all of doubts, that the communication 
problems are related to the infrastructures. Instead  i found in a new 
execution of my program, that due to some unknown and untracked reasons 
the namenode and the tasktracker stop their services due to too many 
failed map tasks. See the logs below.

 From that time on ofc, the running datanodes/tasktracker cannot 
communicate with jobtracker/namenode.


What i do not understand is, why the jobs do not answer or fail. I 
wanted to look it up in the logs, but somehow they do not contain 
anything from times prior to 24:00/0:00 o'clock - a time at which the 
master(s) were already dead for 2 hours.

Are there any suggestions? Maybe did i do something wrong in the Mapper?

Regards,
Elmar

#################################### JOBLOG ... LAST LINES 
#################################
Task attempt_201208152128_0001_m_000007_1 failed to report status for 
601 seconds. Killing!
12/08/15 21:50:10 INFO mapred.JobClient:  map 50% reduce 0%
12/08/15 21:50:12 INFO mapred.JobClient:  map 39% reduce 0%
12/08/15 21:50:13 INFO mapred.JobClient:  map 23% reduce 0%
12/08/15 21:50:14 INFO mapred.JobClient:  map 19% reduce 0%
12/08/15 21:50:15 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000014_1, Status : FAILED
Task attempt_201208152128_0001_m_000014_1 failed to report status for 
602 seconds. Killing!
12/08/15 21:50:15 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000015_1, Status : FAILED
Task attempt_201208152128_0001_m_000015_1 failed to report status for 
602 seconds. Killing!
12/08/15 21:50:17 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000004_1, Status : FAILED
Task attempt_201208152128_0001_m_000004_1 failed to report status for 
602 seconds. Killing!
12/08/15 21:50:17 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000005_1, Status : FAILED
Task attempt_201208152128_0001_m_000005_1 failed to report status for 
602 seconds. Killing!
12/08/15 21:50:17 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000012_1, Status : FAILED
Task attempt_201208152128_0001_m_000012_1 failed to report status for 
602 seconds. Killing!
12/08/15 21:50:18 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000008_1, Status : FAILED
Task attempt_201208152128_0001_m_000008_1 failed to report status for 
602 seconds. Killing!
12/08/15 21:50:19 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000009_1, Status : FAILED
Task attempt_201208152128_0001_m_000009_1 failed to report status for 
601 seconds. Killing!
12/08/15 21:50:19 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000000_1, Status : FAILED
Task attempt_201208152128_0001_m_000000_1 failed to report status for 
601 seconds. Killing!
12/08/15 21:50:19 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000010_1, Status : FAILED
Task attempt_201208152128_0001_m_000010_1 failed to report status for 
601 seconds. Killing!
12/08/15 21:50:20 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000002_1, Status : FAILED
Task attempt_201208152128_0001_m_000002_1 failed to report status for 
601 seconds. Killing!
12/08/15 21:50:21 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000003_1, Status : FAILED
Task attempt_201208152128_0001_m_000003_1 failed to report status for 
601 seconds. Killing!
12/08/15 21:50:22 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000013_1, Status : FAILED
Task attempt_201208152128_0001_m_000013_1 failed to report status for 
602 seconds. Killing!
12/08/15 21:50:22 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000001_1, Status : FAILED
Task attempt_201208152128_0001_m_000001_1 failed to report status for 
601 seconds. Killing!
12/08/15 21:50:23 INFO mapred.JobClient:  map 11% reduce 0%
12/08/15 21:50:25 INFO mapred.JobClient:  map 17% reduce 0%
12/08/15 21:50:27 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000006_1, Status : FAILED
Task attempt_201208152128_0001_m_000006_1 failed to report status for 
602 seconds. Killing!
12/08/15 21:50:27 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000013_2, Status : FAILED
Task attempt_201208152128_0001_m_000013_2 failed to report status for 
602 seconds. Killing!
12/08/15 21:50:27 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000001_2, Status : FAILED
Task attempt_201208152128_0001_m_000001_2 failed to report status for 
602 seconds. Killing!
12/08/15 21:50:28 INFO mapred.JobClient:  map 40% reduce 0%
12/08/15 21:50:29 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000011_2, Status : FAILED
Task attempt_201208152128_0001_m_000011_2 failed to report status for 
601 seconds. Killing!
12/08/15 21:50:30 INFO mapred.JobClient:  map 42% reduce 0%
12/08/15 21:50:31 INFO mapred.JobClient:  map 52% reduce 0%
12/08/15 21:50:33 INFO mapred.JobClient:  map 54% reduce 0%
12/08/15 21:50:37 INFO mapred.JobClient:  map 58% reduce 0%
12/08/15 21:50:39 INFO mapred.JobClient:  map 61% reduce 0%
12/08/15 21:50:42 INFO mapred.JobClient:  map 62% reduce 0%
12/08/15 21:50:46 INFO mapred.JobClient:  map 58% reduce 0%
12/08/15 21:50:55 INFO mapred.JobClient:  map 54% reduce 0%
12/08/15 21:50:57 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000011_1, Status : FAILED
Task attempt_201208152128_0001_m_000011_1 failed to report status for 
602 seconds. Killing!
12/08/15 21:52:10 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000006_2, Status : FAILED
Task attempt_201208152128_0001_m_000006_2 failed to report status for 
602 seconds. Killing!
12/08/15 22:00:25 INFO mapred.JobClient: Task Id : 
attempt_201208152128_0001_m_000007_2, Status : FAILED
Task attempt_201208152128_0001_m_000007_2 failed to report status for 
602 seconds. Killing!
12/08/15 22:00:29 INFO mapred.JobClient:  map 50% reduce 0%
12/08/15 22:00:32 INFO mapred.JobClient:  map 46% reduce 0%
12/08/15 22:00:34 INFO mapred.JobClient:  map 39% reduce 0%
12/08/15 22:00:35 INFO mapred.JobClient:  map 38% reduce 0%
12/08/15 22:00:40 INFO mapred.JobClient: Job complete: job_201208152128_0001
12/08/15 22:00:40 INFO mapred.JobClient: Counters: 8
12/08/15 22:00:40 INFO mapred.JobClient:   Job Counters
12/08/15 22:00:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=8861935
12/08/15 22:00:40 INFO mapred.JobClient:     Total time spent by all 
reduces waiting after reserving slots (ms)=0
12/08/15 22:00:40 INFO mapred.JobClient:     Total time spent by all 
maps waiting after reserving slots (ms)=0
12/08/15 22:00:40 INFO mapred.JobClient:     Rack-local map tasks=52
12/08/15 22:00:40 INFO mapred.JobClient:     Launched map tasks=59
12/08/15 22:00:40 INFO mapred.JobClient:     Data-local map tasks=7
12/08/15 22:00:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/08/15 22:00:40 INFO mapred.JobClient:     Failed map tasks=1
12/08/15 22:00:40 INFO mapred.JobClient: Job Failed: # of failed Map 
Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: 
task_201208152128_0001_m_000007
java.io.IOException: Job failed!
     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
     at 
uni.kassel.macek.rtprep.RetweetApplication.run(RetweetApplication.java:76)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
     at 
uni.kassel.macek.rtprep.RetweetApplication.main(RetweetApplication.java:27)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:601)
     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

#################################### NAMENODE ... LAST LINES 
#################################
2012-08-15 21:44:32,976 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* 
NameSystem.processReport: from 141.51.205.30:50010, blocks: 3, 
processing time: 0 msecs
2012-08-15 21:48:44,346 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 
141.51.205.28
2012-08-15 21:48:44,346 WARN 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit 
log, edits.new files already exists in all healthy directories:
   /work/bmacek/hdfs/current/edits.new
2012-08-15 21:50:04,135 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of 
transactions: 0 Total time for transactions(ms): 0Number of transactions 
batched in Syncs: 16 Number of syncs: 1 SyncTimes(ms): 45
2012-08-15 21:53:44,355 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 
141.51.205.28
2012-08-15 21:53:44,355 WARN 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit 
log, edits.new files already exists in all healthy directories:
   /work/bmacek/hdfs/current/edits.new
2012-08-15 21:56:17,856 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* 
NameSystem.processReport: from 141.51.205.112:50010, blocks: 8, 
processing time: 0 msecs
2012-08-15 21:58:44,362 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 
141.51.205.28
2012-08-15 21:58:44,363 WARN 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit 
log, edits.new files already exists in all healthy directories:
   /work/bmacek/hdfs/current/edits.new
2012-08-15 22:00:19,904 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of 
transactions: 0 Total time for transactions(ms): 0Number of transactions 
batched in Syncs: 36 Number of syncs: 1 SyncTimes(ms): 45
2012-08-15 22:00:35,486 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_8707503977973501396 is added to 
invalidSet of 141.51.205.117:50010
2012-08-15 22:00:35,486 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_8707503977973501396 is added to 
invalidSet of 141.51.205.113:50010
2012-08-15 22:00:35,486 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_8707503977973501396 is added to 
invalidSet of 141.51.205.112:50010
2012-08-15 22:00:35,486 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_8707503977973501396 is added to 
invalidSet of 141.51.205.118:50010
2012-08-15 22:00:35,486 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_8707503977973501396 is added to 
invalidSet of 141.51.205.30:50010
2012-08-15 22:00:35,486 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_8707503977973501396 is added to 
invalidSet of 141.51.205.115:50010
2012-08-15 22:00:35,486 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_8707503977973501396 is added to 
invalidSet of 141.51.205.114:50010
2012-08-15 22:00:35,486 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_8707503977973501396 is added to 
invalidSet of 141.51.205.119:50010
2012-08-15 22:00:35,487 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_331880463236233459 is added to 
invalidSet of 141.51.205.115:50010
2012-08-15 22:00:35,487 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_331880463236233459 is added to 
invalidSet of 141.51.205.114:50010
2012-08-15 22:00:35,487 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_331880463236233459 is added to 
invalidSet of 141.51.205.112:50010
2012-08-15 22:00:35,487 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_331880463236233459 is added to 
invalidSet of 141.51.205.113:50010
2012-08-15 22:00:35,487 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_331880463236233459 is added to 
invalidSet of 141.51.205.30:50010
2012-08-15 22:00:35,487 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_331880463236233459 is added to 
invalidSet of 141.51.205.118:50010
2012-08-15 22:00:35,487 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_331880463236233459 is added to 
invalidSet of 141.51.205.117:50010
2012-08-15 22:00:35,487 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_331880463236233459 is added to 
invalidSet of 141.51.205.119:50010
2012-08-15 22:00:35,487 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_-2215699895714614889 is added to 
invalidSet of 141.51.205.113:50010
2012-08-15 22:00:35,487 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_-9153653706918013228 is added to 
invalidSet of 141.51.205.117:50010
2012-08-15 22:00:37,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
ask 141.51.205.115:50010 to delete blk_331880463236233459_1005 
blk_8707503977973501396_1004
2012-08-15 22:00:37,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
ask 141.51.205.118:50010 to delete blk_331880463236233459_1005 
blk_8707503977973501396_1004
2012-08-15 22:00:37,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
ask 141.51.205.114:50010 to delete blk_331880463236233459_1005 
blk_8707503977973501396_1004
2012-08-15 22:00:39,717 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/output/_logs/history/job_201208152128_0001_1345058961257_bmacek_rtprep.jar. 
blk_8780584073579865736_1010
2012-08-15 22:00:39,724 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 141.51.205.113:50010 is 
added to blk_8780584073579865736_1010 size 39356
2012-08-15 22:00:39,726 INFO org.apache.hadoop.hdfs.StateChange: 
Removing lease on  file 
/output/_logs/history/job_201208152128_0001_1345058961257_bmacek_rtprep.jar 
from client DFSClient_-1191648872
2012-08-15 22:00:39,727 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.completeFile: file 
/output/_logs/history/job_201208152128_0001_1345058961257_bmacek_rtprep.jar 
is closed by DFSClient_-1191648872
2012-08-15 22:00:39,750 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_2297057220333714498 is added to 
invalidSet of 141.51.205.114:50010
<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 >>>>>>>>>>>>> Compare last time with above log<<<<<<<<<<<<<<<<<
<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

#################################### JOBTRACKER ... LAST LINES 
#################################

2012-08-15 22:00:33,295 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000008_2'
2012-08-15 22:00:33,296 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (TASK_CLEANUP) 'attempt_201208152128_0001_m_000008_2' to tip 
task_201208152128_0001_m_000008, for tracker 
'tracker_its-cs208.its.uni-kassel.de:localhost/127.0.0.1:58503'
2012-08-15 22:00:33,696 INFO org.apache.hadoop.mapred.TaskInProgress: 
TaskInProgress task_201208152128_0001_m_000007 has failed 4 times.
2012-08-15 22:00:33,696 INFO org.apache.hadoop.mapred.JobInProgress: 
Aborting job job_201208152128_0001
2012-08-15 22:00:33,696 INFO org.apache.hadoop.mapred.JobInProgress: 
Killing job 'job_201208152128_0001'
2012-08-15 22:00:33,697 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (JOB_CLEANUP) 'attempt_201208152128_0001_m_000016_0' to tip 
task_201208152128_0001_m_000016, for tracker 
'tracker_its-cs120.its.uni-kassel.de:localhost/127.0.0.1:52467'
2012-08-15 22:00:33,698 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000007_3'
2012-08-15 22:00:33,705 INFO org.apache.hadoop.mapred.TaskInProgress: 
Error from attempt_201208152128_0001_m_000015_2: Task 
attempt_201208152128_0001_m_000015_2 failed to report status for 601 
seconds. Killing!
2012-08-15 22:00:33,706 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000015_2'
2012-08-15 22:00:36,200 INFO org.apache.hadoop.mapred.TaskInProgress: 
Error from attempt_201208152128_0001_m_000003_2: Task 
attempt_201208152128_0001_m_000003_2 failed to report status for 600 
seconds. Killing!
2012-08-15 22:00:36,201 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000003_2'
2012-08-15 22:00:36,534 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000014_2'
2012-08-15 22:00:36,702 INFO org.apache.hadoop.mapred.TaskInProgress: 
Error from attempt_201208152128_0001_m_000012_2: Task 
attempt_201208152128_0001_m_000012_2 failed to report status for 601 
seconds. Killing!
2012-08-15 22:00:36,703 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000012_2'
2012-08-15 22:00:36,708 INFO org.apache.hadoop.mapred.TaskInProgress: 
Error from attempt_201208152128_0001_m_000009_2: Task 
attempt_201208152128_0001_m_000009_2 failed to report status for 601 
seconds. Killing!
2012-08-15 22:00:36,709 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000009_2'
2012-08-15 22:00:36,778 INFO org.apache.hadoop.mapred.TaskInProgress: 
Error from attempt_201208152128_0001_m_000000_2: Task 
attempt_201208152128_0001_m_000000_2 failed to report status for 601 
seconds. Killing!
2012-08-15 22:00:36,779 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000000_2'
2012-08-15 22:00:36,779 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000010_2'
2012-08-15 22:00:37,024 INFO org.apache.hadoop.mapred.TaskInProgress: 
TaskInProgress task_201208152128_0001_m_000007 has failed 4 times.
2012-08-15 22:00:37,085 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000007_4'
2012-08-15 22:00:37,085 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000014_3'
2012-08-15 22:00:38,758 INFO org.apache.hadoop.mapred.TaskInProgress: 
Error from attempt_201208152128_0001_m_000002_2: Task 
attempt_201208152128_0001_m_000002_2 failed to report status for 601 
seconds. Killing!
2012-08-15 22:00:38,759 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000002_2'
2012-08-15 22:00:38,759 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000013_3'
2012-08-15 22:00:39,202 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000001_4'
2012-08-15 22:00:39,202 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000003_3'
2012-08-15 22:00:39,205 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000004_2'
2012-08-15 22:00:39,206 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000005_2'
2012-08-15 22:00:39,206 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000009_3'
2012-08-15 22:00:39,240 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000000_3'
2012-08-15 22:00:39,240 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000010_3'
2012-08-15 22:00:39,303 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000001_3'
2012-08-15 22:00:39,303 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000008_2'
2012-08-15 22:00:39,539 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000006_3'
2012-08-15 22:00:39,707 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201208152128_0001_m_000016_0' has completed 
task_201208152128_0001_m_000016 successfully.
2012-08-15 22:00:39,712 INFO 
org.apache.hadoop.mapred.JobInProgress$JobSummary: 
jobId=job_201208152128_0001,submitTime=1345058961257,launchTime=1345058961719,firstMapTaskLaunchTime=1345058968763,firstJobSetupTaskLaunchTime=1345058962669,firstJobCleanupTaskLaunchTime=1345060833697,finishTime=1345060839709,numMaps=16,numSlotsPerMap=1,numReduces=1,numSlotsPerReduce=1,user=bmacek,queue=default,status=FAILED,mapSlotSeconds=8861,reduceSlotsSeconds=0,clusterMapCapacity=22,clusterReduceCapacity=22
2012-08-15 22:00:39,742 INFO org.apache.hadoop.mapred.JobTracker: 
Removing task 'attempt_201208152128_0001_m_000016_0'
2012-08-15 22:00:39,742 INFO org.apache.hadoop.mapred.JobHistory: 
Creating DONE subfolder at 
file:/gpfs/home02/fb16/bmacek/hadoop-1.0.2/logs/history/done/version-1/its-cs117.its.uni-kassel.de_1345058925479_/2012/08/15/000000
2012-08-15 22:00:39,747 INFO org.apache.hadoop.mapred.JobHistory: Moving 
file:/gpfs/home02/fb16/bmacek/hadoop-1.0.2/logs/history/job_201208152128_0001_1345058961257_bmacek_rtprep.jar 
to 
file:/gpfs/home02/fb16/bmacek/hadoop-1.0.2/logs/history/done/version-1/its-cs117.its.uni-kassel.de_1345058925479_/2012/08/15/000000
2012-08-15 22:00:39,762 INFO org.apache.hadoop.mapred.JobHistory: Moving 
file:/gpfs/home02/fb16/bmacek/hadoop-1.0.2/logs/history/job_201208152128_0001_conf.xml 
to 
file:/gpfs/home02/fb16/bmacek/hadoop-1.0.2/logs/history/done/version-1/its-cs117.its.uni-kassel.de_1345058925479_/2012/08/15/000000
<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 >>>>>>>>>>>>> Compare last time with above log<<<<<<<<<<<<<<<<<
<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>





Am 14.08.2012 13:25, schrieb Björn-Elmar Macek:
> Hi Michael and Mohammad,
>
> thanks alot for your inpus!
> i have pinged the people at the cluster in order to (eventually 
> disable IPv6) and definetly check the ports corresponding to the 
> appropriate machines. I will keep you updated.
>
> Regards,
> Elmar
>
>
> Am 13.08.2012 22:39, schrieb Michael Segel:
>>
>> The key is to think about what can go wrong, but start with the low 
>> hanging fruit.
>>
>> I mean you could be right, however you're jumping the gun and are 
>> over looking simpler issues.
>>
>> The most common issue is that the networking traffic is being filtered.
>> Of course since we're both diagnosing this with minimal information, 
>> we're kind of shooting from the hip.
>>
>> This is why I'm asking if there is any networking traffic between the 
>> nodes.  If you have partial communication, then focus on why you 
>> can't see the specific traffic.
>>
>>
>> On Aug 13, 2012, at 10:05 AM, Mohammad Tariq <dontariq@gmail.com 
>> <mailto:dontariq@gmail.com>> wrote:
>>
>>> Thank you so very much for the detailed response Michael. I'll keep 
>>> the tip in mind. Please pardon my ignorance, as I am still in the 
>>> learning phase.
>>>
>>> Regards,
>>>     Mohammad Tariq
>>>
>>>
>>>
>>> On Mon, Aug 13, 2012 at 8:29 PM, Michael Segel 
>>> <michael_segel@hotmail.com <mailto:michael_segel@hotmail.com>> wrote:
>>>
>>>     0.0.0.0 means that the call is going to all interfaces on the
>>>     machine.  (Shouldn't be an issue...)
>>>
>>>     IPv4 vs IPv6? Could be an issue, however OP says he can write
>>>     data to DNs and they seem to communicate, therefore if its IPv6
>>>     related, wouldn't it impact all traffic and not just a specific
>>>     port?
>>>     I agree... shut down IPv6 if you can.
>>>
>>>     I don't disagree with your assessment. I am just suggesting that
>>>     before you do a really deep dive, you think about the more
>>>     obvious stuff first.
>>>
>>>     There are a couple of other things... like do all of the
>>>     /etc/hosts files on all of the machines match?
>>>     Is the OP using both /etc/hosts and DNS? If so, are they in sync?
>>>
>>>     BTW, you said DNS in your response. if you're using DNS, then
>>>     you don't really want to have much info in the /etc/hosts file
>>>     except loopback and the server's IP address.
>>>
>>>     Looking at the problem OP is indicating some traffic works,
>>>     while other traffic doesn't. Most likely something is blocking
>>>     the ports. Iptables is the first place to look.
>>>
>>>     Just saying. ;-)
>>>
>>>
>>>     On Aug 13, 2012, at 9:12 AM, Mohammad Tariq <dontariq@gmail.com
>>>     <mailto:dontariq@gmail.com>> wrote:
>>>
>>>>     Hi Michael,
>>>>            I asked for hosts file because there seems to be some
>>>>     loopback prob to me. The log shows that call is going at
>>>>     0.0.0.0. Apart from what you have said, I think disabling IPv6
>>>>     and making sure that there is no prob with the DNS resolution
>>>>     is also necessary. Please correct me if I am wrong. Thank you.
>>>>
>>>>     Regards,
>>>>         Mohammad Tariq
>>>>
>>>>
>>>>
>>>>     On Mon, Aug 13, 2012 at 7:06 PM, Michael Segel
>>>>     <michael_segel@hotmail.com <mailto:michael_segel@hotmail.com>>
>>>>     wrote:
>>>>
>>>>         Based on your /etc/hosts output, why aren't you using DNS?
>>>>
>>>>         Outside of MapR, multihomed machines can be problematic.
>>>>         Hadoop doesn't generally work well when you're not using
>>>>         the FQDN or its alias.
>>>>
>>>>         The issue isn't the SSH, but if you go to the node which is
>>>>         having trouble connecting to another node,  then try to
>>>>         ping it, or some other general communication,  if it
>>>>         succeeds, your issue is that the port you're trying to
>>>>         communicate with is blocked.  Then its more than likely an
>>>>         ipconfig or firewall issue.
>>>>
>>>>         On Aug 13, 2012, at 8:17 AM, Björn-Elmar Macek
>>>>         <ema@cs.uni-kassel.de <mailto:ema@cs.uni-kassel.de>> wrote:
>>>>
>>>>>         Hi Michael,
>>>>>
>>>>>         well i can ssh from any node to any other without being
>>>>>         prompted. The reason for this is, that my home dir is
>>>>>         mounted in every server in the cluster.
>>>>>
>>>>>         If the machines are multihomed: i dont know. i could ask
>>>>>         if this would be of importance.
>>>>>
>>>>>         Shall i?
>>>>>
>>>>>         Regards,
>>>>>         Elmar
>>>>>
>>>>>         Am 13.08.12 14:59, schrieb Michael Segel:
>>>>>>         If the nodes can communicate and distribute data, then
>>>>>>         the odds are that the issue isn't going to be in his
>>>>>>         /etc/hosts.
>>>>>>
>>>>>>         A more relevant question is if he's running a firewall on
>>>>>>         each of these machines?
>>>>>>
>>>>>>         A simple test... ssh to one node, ping other nodes and
>>>>>>         the control nodes at random to see if they can see one
>>>>>>         another. Then check to see if there is a firewall running
>>>>>>         which would limit the types of traffic between nodes.
>>>>>>
>>>>>>         One other side note... are these machines multi-homed?
>>>>>>
>>>>>>         On Aug 13, 2012, at 7:51 AM, Mohammad Tariq
>>>>>>         <dontariq@gmail.com <mailto:dontariq@gmail.com>> wrote:
>>>>>>
>>>>>>>         Hello there,
>>>>>>>
>>>>>>>          Could you please share your /etc/hosts file, if you
>>>>>>>         don't mind.
>>>>>>>
>>>>>>>         Regards,
>>>>>>>          Mohammad Tariq
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         On Mon, Aug 13, 2012 at 6:01 PM, Björn-Elmar Macek
>>>>>>>         <macek@cs.uni-kassel.de <mailto:macek@cs.uni-kassel.de>>
>>>>>>>         wrote:
>>>>>>>
>>>>>>>             Hi,
>>>>>>>
>>>>>>>             i am currently trying to run my hadoop program on a
>>>>>>>             cluster. Sadly though my datanodes and tasktrackers
>>>>>>>             seem to have difficulties with their communication
>>>>>>>             as their logs say:
>>>>>>>             * Some datanodes and tasktrackers seem to have
>>>>>>>             portproblems of some kind as it can be seen in the
>>>>>>>             logs below. I wondered if this might be due to
>>>>>>>             reasons correllated with the localhost entry in
>>>>>>>             /etc/hosts as you can read in alot of posts with
>>>>>>>             similar errors, but i checked the file neither
>>>>>>>             localhost nor 127.0.0.1/127.0.1.1
>>>>>>>             <http://127.0.0.1/127.0.1.1> is bound there.
>>>>>>>             (although you can ping localhost... the technician
>>>>>>>             of the cluster said he'd be looking for the
>>>>>>>             mechanics resolving localhost)
>>>>>>>             * The other nodes can not speak with the namenode
>>>>>>>             and jobtracker (its-cs131). Although it is
>>>>>>>             absolutely not clear, why this is happening: the
>>>>>>>             "dfs -put" i do directly before the job is running
>>>>>>>             fine, which seems to imply that communication
>>>>>>>             between those servers is working flawlessly.
>>>>>>>
>>>>>>>             Is there any reason why this might happen?
>>>>>>>
>>>>>>>
>>>>>>>             Regards,
>>>>>>>             Elmar
>>>>>>>
>>>>>>>             LOGS BELOW:
>>>>>>>
>>>>>>>             \____Datanodes
>>>>>>>
>>>>>>>             After successfully putting the data to hdfs (at this
>>>>>>>             point i thought namenode and datanodes have to
>>>>>>>             communicate), i get the following errors when
>>>>>>>             starting the job:
>>>>>>>
>>>>>>>             There are 2 kinds of logs i found: the first one is
>>>>>>>             big (about 12MB) and looks like this:
>>>>>>>             ############################### LOG TYPE 1
>>>>>>>             ############################################################
>>>>>>>             2012-08-13 08:23:27,331 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35554
>>>>>>>             <http://141.51.205.41:35554/>. Already tried 0 time(s).
>>>>>>>             2012-08-13 08:23:28,332 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35554
>>>>>>>             <http://141.51.205.41:35554/>. Already tried 1 time(s).
>>>>>>>             2012-08-13 08:23:29,332 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35554
>>>>>>>             <http://141.51.205.41:35554/>. Already tried 2 time(s).
>>>>>>>             2012-08-13 08:23:30,332 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35554
>>>>>>>             <http://141.51.205.41:35554/>. Already tried 3 time(s).
>>>>>>>             2012-08-13 08:23:31,333 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35554
>>>>>>>             <http://141.51.205.41:35554/>. Already tried 4 time(s).
>>>>>>>             2012-08-13 08:23:32,333 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35554
>>>>>>>             <http://141.51.205.41:35554/>. Already tried 5 time(s).
>>>>>>>             2012-08-13 08:23:33,334 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35554
>>>>>>>             <http://141.51.205.41:35554/>. Already tried 6 time(s).
>>>>>>>             2012-08-13 08:23:34,334 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35554
>>>>>>>             <http://141.51.205.41:35554/>. Already tried 7 time(s).
>>>>>>>             2012-08-13 08:23:35,334 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35554
>>>>>>>             <http://141.51.205.41:35554/>. Already tried 8 time(s).
>>>>>>>             2012-08-13 08:23:36,335 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35554
>>>>>>>             <http://141.51.205.41:35554/>. Already tried 9 time(s).
>>>>>>>             2012-08-13 08:23:36,335 WARN
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.DataNode:
>>>>>>>             java.net.ConnectException: Call to
>>>>>>>             its-cs131/141.51.205.41:35554
>>>>>>>             <http://141.51.205.41:35554/> failed on connection
>>>>>>>             exception: java.net.ConnectException: Connection refused
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Client.call(Client.java:1071)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>>>>>>>                 at $Proxy5.sendHeartbeat(Unknown Source)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:904)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1458)
>>>>>>>                 at java.lang.Thread.run(Thread.java:619)
>>>>>>>             Caused by: java.net.ConnectException: Connection refused
>>>>>>>                 at
>>>>>>>             sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>                 at
>>>>>>>             sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>>>>>>                 at org.apache.hadoop.net
>>>>>>>             <http://org.apache.hadoop.net/>.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>>>>>>                 at org.apache.hadoop.net
>>>>>>>             <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:489)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Client.call(Client.java:1046)
>>>>>>>                 ... 5 more
>>>>>>>
>>>>>>>             ... (this continues til the end of the log)
>>>>>>>
>>>>>>>             The second is short kind:
>>>>>>>             ########################### LOG TYPE 2
>>>>>>>             ############################################################
>>>>>>>             2012-08-13 00:59:19,038 INFO
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.DataNode:
>>>>>>>             STARTUP_MSG:
>>>>>>>             /************************************************************
>>>>>>>             STARTUP_MSG: Starting DataNode
>>>>>>>             STARTUP_MSG: host =
>>>>>>>             its-cs133.its.uni-kassel.de/141.51.205.43
>>>>>>>             <http://its-cs133.its.uni-kassel.de/141.51.205.43>
>>>>>>>             STARTUP_MSG: args = []
>>>>>>>             STARTUP_MSG: version = 1.0.2
>>>>>>>             STARTUP_MSG: build =
>>>>>>>             https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2
>>>>>>>             -r 1304954; compiled by 'hortonfo' on Sat Mar 24
>>>>>>>             23:58:21 UTC 2012
>>>>>>>             ************************************************************/
>>>>>>>             2012-08-13 00:59:19,203 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsConfig:
>>>>>>>             loaded properties from hadoop-metrics2.properties
>>>>>>>             2012-08-13 00:59:19,216 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
>>>>>>>             for source MetricsSystem,sub=Stats registered.
>>>>>>>             2012-08-13 00:59:19,217 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
>>>>>>>             Scheduled snapshot period at 10 second(s).
>>>>>>>             2012-08-13 00:59:19,218 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
>>>>>>>             DataNode metrics system started
>>>>>>>             2012-08-13 00:59:19,306 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
>>>>>>>             for source ugi registered.
>>>>>>>             2012-08-13 00:59:19,346 INFO
>>>>>>>             org.apache.hadoop.util.NativeCodeLoader: Loaded the
>>>>>>>             native-hadoop library
>>>>>>>             2012-08-13 00:59:20,482 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35554
>>>>>>>             <http://141.51.205.41:35554/>. Already tried 0 time(s).
>>>>>>>             2012-08-13 00:59:21,584 INFO
>>>>>>>             org.apache.hadoop.hdfs.server.common.Storage:
>>>>>>>             Storage directory
>>>>>>>             /home/work/bmacek/hadoop/hdfs/slave is not formatted.
>>>>>>>             2012-08-13 00:59:21,584 INFO
>>>>>>>             org.apache.hadoop.hdfs.server.common.Storage:
>>>>>>>             Formatting ...
>>>>>>>             2012-08-13 00:59:21,787 INFO
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.DataNode:
>>>>>>>             Registered FSDatasetStatusMBean
>>>>>>>             2012-08-13 00:59:21,897 INFO
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:
>>>>>>>             Shutting down all async disk service threads...
>>>>>>>             2012-08-13 00:59:21,897 INFO
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:
>>>>>>>             All async disk service threads have been shut down.
>>>>>>>             2012-08-13 00:59:21,898 ERROR
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.DataNode:
>>>>>>>             java.net.BindException: Problem binding to
>>>>>>>             /0.0.0.0:50010 <http://0.0.0.0:50010/> : Address
>>>>>>>             already in use
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Server.bind(Server.java:227)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:404)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
>>>>>>>             Caused by: java.net.BindException: Address already
>>>>>>>             in use
>>>>>>>                 at sun.nio.ch.Net.bind(Native Method)
>>>>>>>                 at sun.nio.ch
>>>>>>>             <http://sun.nio.ch/>.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
>>>>>>>                 at sun.nio.ch
>>>>>>>             <http://sun.nio.ch/>.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Server.bind(Server.java:225)
>>>>>>>                 ... 7 more
>>>>>>>
>>>>>>>             2012-08-13 00:59:21,899 INFO
>>>>>>>             org.apache.hadoop.hdfs.server.datanode.DataNode:
>>>>>>>             SHUTDOWN_MSG:
>>>>>>>             /************************************************************
>>>>>>>             SHUTDOWN_MSG: Shutting down DataNode at
>>>>>>>             its-cs133.its.uni-kassel.de/141.51.205.43
>>>>>>>             <http://its-cs133.its.uni-kassel.de/141.51.205.43>
>>>>>>>             ************************************************************/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>             \_____TastTracker
>>>>>>>             With TaskTrackers it is the same: there are 2 kinds.
>>>>>>>             ############################### LOG TYPE 1
>>>>>>>             ############################################################
>>>>>>>             2012-08-13 02:09:54,645 INFO
>>>>>>>             org.apache.hadoop.mapred.TaskTracker: Resending
>>>>>>>             'status' to 'its-cs131' with reponseId '879
>>>>>>>             2012-08-13 02:09:55,646 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35555
>>>>>>>             <http://141.51.205.41:35555/>. Already tried 0 time(s).
>>>>>>>             2012-08-13 02:09:56,646 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35555
>>>>>>>             <http://141.51.205.41:35555/>. Already tried 1 time(s).
>>>>>>>             2012-08-13 02:09:57,647 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35555
>>>>>>>             <http://141.51.205.41:35555/>. Already tried 2 time(s).
>>>>>>>             2012-08-13 02:09:58,647 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35555
>>>>>>>             <http://141.51.205.41:35555/>. Already tried 3 time(s).
>>>>>>>             2012-08-13 02:09:59,648 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35555
>>>>>>>             <http://141.51.205.41:35555/>. Already tried 4 time(s).
>>>>>>>             2012-08-13 02:10:00,648 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35555
>>>>>>>             <http://141.51.205.41:35555/>. Already tried 5 time(s).
>>>>>>>             2012-08-13 02:10:01,649 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35555
>>>>>>>             <http://141.51.205.41:35555/>. Already tried 6 time(s).
>>>>>>>             2012-08-13 02:10:02,649 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35555
>>>>>>>             <http://141.51.205.41:35555/>. Already tried 7 time(s).
>>>>>>>             2012-08-13 02:10:03,650 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35555
>>>>>>>             <http://141.51.205.41:35555/>. Already tried 8 time(s).
>>>>>>>             2012-08-13 02:10:04,650 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35555
>>>>>>>             <http://141.51.205.41:35555/>. Already tried 9 time(s).
>>>>>>>             2012-08-13 02:10:04,651 ERROR
>>>>>>>             org.apache.hadoop.mapred.TaskTracker: Caught
>>>>>>>             exception: java.net.ConnectException: Call to
>>>>>>>             its-cs131/141.51.205.41:35555
>>>>>>>             <http://141.51.205.41:35555/> failed on connection
>>>>>>>             exception: java.net.ConnectException: Connection refused
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Client.call(Client.java:1071)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.mapred.$Proxy5.heartbeat(Unknown
>>>>>>>             Source)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1857)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1653)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2503)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3744)
>>>>>>>             Caused by: java.net.ConnectException: Connection refused
>>>>>>>                 at
>>>>>>>             sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>                 at
>>>>>>>             sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>>>>>>                 at org.apache.hadoop.net
>>>>>>>             <http://org.apache.hadoop.net/>.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>>>>>>                 at org.apache.hadoop.net
>>>>>>>             <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:489)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.ipc.Client.call(Client.java:1046)
>>>>>>>                 ... 6 more
>>>>>>>
>>>>>>>
>>>>>>>             ########################### LOG TYPE 2
>>>>>>>             ############################################################
>>>>>>>             2012-08-13 00:59:24,376 INFO
>>>>>>>             org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG:
>>>>>>>             /************************************************************
>>>>>>>             STARTUP_MSG: Starting TaskTracker
>>>>>>>             STARTUP_MSG: host =
>>>>>>>             its-cs133.its.uni-kassel.de/141.51.205.43
>>>>>>>             <http://its-cs133.its.uni-kassel.de/141.51.205.43>
>>>>>>>             STARTUP_MSG: args = []
>>>>>>>             STARTUP_MSG: version = 1.0.2
>>>>>>>             STARTUP_MSG: build =
>>>>>>>             https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2
>>>>>>>             -r 1304954; compiled by 'hortonfo' on Sat Mar 24
>>>>>>>             23:58:21 UTC 2012
>>>>>>>             ************************************************************/
>>>>>>>             2012-08-13 00:59:24,569 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsConfig:
>>>>>>>             loaded properties from hadoop-metrics2.properties
>>>>>>>             2012-08-13 00:59:24,626 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
>>>>>>>             for source MetricsSystem,sub=Stats registered.
>>>>>>>             2012-08-13 00:59:24,627 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
>>>>>>>             Scheduled snapshot period at 10 second(s).
>>>>>>>             2012-08-13 00:59:24,627 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
>>>>>>>             TaskTracker metrics system started
>>>>>>>             2012-08-13 00:59:24,950 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
>>>>>>>             for source ugi registered.
>>>>>>>             2012-08-13 00:59:25,146 INFO org.mortbay.log:
>>>>>>>             Logging to
>>>>>>>             org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log)
>>>>>>>             via org.mortbay.log.Slf4jLog
>>>>>>>             2012-08-13 00:59:25,206 INFO
>>>>>>>             org.apache.hadoop.http.HttpServer: Added global
>>>>>>>             filtersafety
>>>>>>>             (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>>>>>>>             2012-08-13 00:59:25,232 INFO
>>>>>>>             org.apache.hadoop.mapred.TaskLogsTruncater:
>>>>>>>             Initializing logs' truncater with mapRetainSize=-1
>>>>>>>             and reduceRetainSize=-1
>>>>>>>             2012-08-13 00:59:25,237 INFO
>>>>>>>             org.apache.hadoop.mapred.TaskTracker: Starting
>>>>>>>             tasktracker with owner as bmacek
>>>>>>>             2012-08-13 00:59:25,239 INFO
>>>>>>>             org.apache.hadoop.mapred.TaskTracker: Good mapred
>>>>>>>             local directories are:
>>>>>>>             /home/work/bmacek/hadoop/hdfs/tmp/mapred/local
>>>>>>>             2012-08-13 00:59:25,244 INFO
>>>>>>>             org.apache.hadoop.util.NativeCodeLoader: Loaded the
>>>>>>>             native-hadoop library
>>>>>>>             2012-08-13 00:59:25,255 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
>>>>>>>             for source jvm registered.
>>>>>>>             2012-08-13 00:59:25,256 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
>>>>>>>             for source TaskTrackerMetrics registered.
>>>>>>>             2012-08-13 00:59:25,279 INFO
>>>>>>>             org.apache.hadoop.ipc.Server: Starting SocketReader
>>>>>>>             2012-08-13 00:59:25,282 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
>>>>>>>             for source RpcDetailedActivityForPort54850 registered.
>>>>>>>             2012-08-13 00:59:25,282 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
>>>>>>>             for source RpcActivityForPort54850 registered.
>>>>>>>             2012-08-13 00:59:25,287 INFO
>>>>>>>             org.apache.hadoop.ipc.Server: IPC Server Responder:
>>>>>>>             starting
>>>>>>>             2012-08-13 00:59:25,288 INFO
>>>>>>>             org.apache.hadoop.ipc.Server: IPC Server listener on
>>>>>>>             54850: starting
>>>>>>>             2012-08-13 00:59:25,288 INFO
>>>>>>>             org.apache.hadoop.ipc.Server: IPC Server handler 0
>>>>>>>             on 54850: starting
>>>>>>>             2012-08-13 00:59:25,288 INFO
>>>>>>>             org.apache.hadoop.ipc.Server: IPC Server handler 1
>>>>>>>             on 54850: starting
>>>>>>>             2012-08-13 00:59:25,289 INFO
>>>>>>>             org.apache.hadoop.mapred.TaskTracker: TaskTracker up
>>>>>>>             at: localhost/127.0.0.1:54850 <http://127.0.0.1:54850/>
>>>>>>>             2012-08-13 00:59:25,289 INFO
>>>>>>>             org.apache.hadoop.ipc.Server: IPC Server handler 3
>>>>>>>             on 54850: starting
>>>>>>>             2012-08-13 00:59:25,289 INFO
>>>>>>>             org.apache.hadoop.ipc.Server: IPC Server handler 2
>>>>>>>             on 54850: starting
>>>>>>>             2012-08-13 00:59:25,289 INFO
>>>>>>>             org.apache.hadoop.mapred.TaskTracker: Starting
>>>>>>>             tracker
>>>>>>>             tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850
>>>>>>>             <http://127.0.0.1:54850/>
>>>>>>>             2012-08-13 00:59:26,321 INFO
>>>>>>>             org.apache.hadoop.ipc.Client: Retrying connect to
>>>>>>>             server: its-cs131/141.51.205.41:35555
>>>>>>>             <http://141.51.205.41:35555/>. Already tried 0 time(s).
>>>>>>>             2012-08-13 00:59:38,104 INFO
>>>>>>>             org.apache.hadoop.mapred.TaskTracker: Starting
>>>>>>>             thread: Map-events fetcher for all reduce tasks on
>>>>>>>             tracker_its-cs133.its.uni-kassel.de:localhost/127.0.0.1:54850
>>>>>>>             <http://127.0.0.1:54850/>
>>>>>>>             2012-08-13 00:59:38,120 INFO
>>>>>>>             org.apache.hadoop.util.ProcessTree: setsid exited
>>>>>>>             with exit code 0
>>>>>>>             2012-08-13 00:59:38,134 INFO
>>>>>>>             org.apache.hadoop.mapred.TaskTracker: Using
>>>>>>>             ResourceCalculatorPlugin :
>>>>>>>             org.apache.hadoop.util.LinuxResourceCalculatorPlugin@445e228
>>>>>>>             2012-08-13 00:59:38,137 WARN
>>>>>>>             org.apache.hadoop.mapred.TaskTracker: TaskTracker's
>>>>>>>             totalMemoryAllottedForTasks is -1. TaskMemoryManager
>>>>>>>             is disabled.
>>>>>>>             2012-08-13 00:59:38,145 INFO
>>>>>>>             org.apache.hadoop.mapred.IndexCache: IndexCache
>>>>>>>             created with max memory = 10485760
>>>>>>>             2012-08-13 00:59:38,158 INFO
>>>>>>>             org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
>>>>>>>             for source ShuffleServerMetrics registered.
>>>>>>>             2012-08-13 00:59:38,161 INFO
>>>>>>>             org.apache.hadoop.http.HttpServer: Port returned by
>>>>>>>             webServer.getConnectors()[0].getLocalPort() before
>>>>>>>             open() is -1. Opening the listener on 50060
>>>>>>>             2012-08-13 00:59:38,161 ERROR
>>>>>>>             org.apache.hadoop.mapred.TaskTracker: Can not start
>>>>>>>             task tracker because java.net.BindException: Address
>>>>>>>             already in use
>>>>>>>                 at sun.nio.ch.Net.bind(Native Method)
>>>>>>>                 at sun.nio.ch
>>>>>>>             <http://sun.nio.ch/>.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
>>>>>>>                 at sun.nio.ch
>>>>>>>             <http://sun.nio.ch/>.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>>>>>>>                 at
>>>>>>>             org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.http.HttpServer.start(HttpServer.java:581)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1502)
>>>>>>>                 at
>>>>>>>             org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3742)
>>>>>>>
>>>>>>>             2012-08-13 00:59:38,163 INFO
>>>>>>>             org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG:
>>>>>>>             /************************************************************
>>>>>>>             SHUTDOWN_MSG: Shutting down TaskTracker at
>>>>>>>             its-cs133.its.uni-kassel.de/141.51.205.43
>>>>>>>             <http://its-cs133.its.uni-kassel.de/141.51.205.43>
>>>>>>>             ************************************************************/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>


Mime
View raw message