hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <nutch-...@dragonflymc.com>
Subject Tasks Fail after having completed and Reduces lost
Date Fri, 02 Jun 2006 19:35:30 GMT
I am seeing some weird behavior with tasks on hadoop-0.3 dev with the 
current nutch.  This is on an 11 node cluster with 55 map tasks 
(although this is only happening on the fetcher and fetcher is only 
running 11 map tasks with 10 fetcher threads each, and 11 reduce tasks.  
It is running java 5.06. 

I am doing 100K page crawls and the task complete with the following 
message in the logs:

060603 023804 task_0006_m_000001_0 done; removing files.

But then some of the completed tasks are marked as failed.  I am also 
seeing alot of these kind of errors in the logs:

060603 023607 Error from task_0006_r_000007_3: Task failed to report 
status for 603 seconds. Killing.
060603 023607 Aborting job job_0006
060603 023607 Task 'task_0006_r_000008_3' has been lost.
060603 023607 TaskInProgress tip_0006_r_000008 has failed 4 times.
060603 023607 Error from task_0006_r_000008_3: Task failed to report 
status for 608 seconds. Killing.

and then finally I see these errors:

060603 023614 Task 'task_0006_m_000000_0' has been lost.
060603 023614 Aborting job job_0006
060603 023616 Task 'task_0006_m_000001_0' has been lost.
060603 023616 Aborting job job_0006

I know this is a general question but can someone point me in the the 
direction of why this might be happening.  It seems like the maps are 
taking too long and the reduces just give up.


View raw message