hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dave bayer <da...@cloudfactory.org>
Subject Re: job stops progressing because tasktracker stop taking tasks
Date Mon, 26 Oct 2009 22:58:20 GMT

On Oct 25, 2009, at 10:29 PM, Runping Qi wrote:

> I was testing a job on a single node hadoop cluster running Hadoo9  
> 0.19.
> The single tasktracker has 2 reduce slots.
> After finishing 8 reduce tasks out of 17 total reduce tasks, the  
> tasktracker
> stopped taking any new tasks.
> The job made no more progress then.
>
> Anybody encountered somilar situations?

I had something similar happen over the weekend. Not sure if this is
exactly what you were seeing:

On a ~20 node cluster running 0.19.2, 90 map slots, 45 reduce slots:

Jobtracker stops scheduling jobs, webUI shows no jobs running or in the
completed/failed list. Didn't check the queue lists (I have 3 queues,  
one
for adhoc, one for nightly production jobs and one for data load  
jobs). This
is using the default JobQueueTaskScheduler scheduler (had tried the
Capacity Scheduler but found that ran into deadlocks from threads  
obtaining
monitors and then calling routines through the reflection API that would
attempt to lock the same monitor).

Jobtracker would accept new jobs, issue IDs, even report the map and  
reduce
status (which would never proceed beyond 0%) but not show these jobs in
the webUI and I do not believe they appeared in the hadoop job -list  
output
which if memory serves, was empty.

Nothing in the logs pointing to problems. Jstack doesn't show  
deadlocks or
any thread really even doing much of anything. Didn't think to attach a
remote debugger to the process til I had restarted it.

Didn't find anything in JIRA that might relate to this. Don't have  
recreation
steps because everything seemed to be 'working' but no progress was
ever made.

dave bayer

Mime
View raw message