hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amar Kamat <ama...@yahoo-inc.com>
Subject Re: Job tracker not responding during streaming job
Date Tue, 07 Apr 2009 04:42:50 GMT
David Kellogg wrote:
> I am running Hadoop streaming. After around 42 jobs on an 18-node 
> cluster, the jobtracker stops responding. This happens on 
> normally-working code. Here are the symptoms.
>
> 1. A job is running, but it pauses with reduce stuck at XX%
> 2. "hadoop job -list" hangs or takes a very long time to return
> 3. In the Ganglia metrics on the Jobtracker node:
>      a. jvm.metrics__JobTracker__gcTimeMillis rises above 20 k (20 
> seconds) before failure
>      b. jvm.metrics__JobTracker__memHeapUsedM rises above 600 before 
> failure
>      c. jvm.metrics__JobTracker__gcCount rises above 1 k before failure
>
>
> The ticker looks like this.
>
> 09/04/06 03:06:28 INFO streaming.StreamJob:  map 24%  reduce 7%
> 09/04/06 03:13:44 INFO streaming.StreamJob:  map 25%  reduce 7%
> After the 03:13:44 line, it hangs for more than 15 minutes.
>
> In the jobtracker log, I see this.
>
> 2009-04-04 04:19:13,563 WARN org.apache.hadoop.hdfs.DFSClient: Error 
> Recovery for block blk_-8143535428142072268_95993 failed  because 
> recovery from primary datanode 10.1.0.156:50010 failed 4 times. Will 
> retry...
>
> After restarting both dfs and mapreduce on all nodes, the problem goes 
> away, and the formally non-working job proceeds without failure.
David,
What version are you using?
There can be because of :
1) Number of tasks in jobtracker's memory might exceed its limits. What 
is the total number of tasks in the jobtracker's memory? What is the 
jobtracker's heap size? Try increasing the heap size and also try 
setting the mapred.jobtracker.completeuserjobs.maximum parameter to some 
low value.
2) Sometimes some slow/bad datanode causes the jobtracker to get stuck. 
As you have mentioned this might be the cause. Can you let us know the 
output of 'kill -3' on jobtracker process.
>
> Does anyone else see this problem?
>
> David Kellogg


Mime
View raw message