hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Kellogg <d...@wink.com>
Subject Job tracker not responding during streaming job
Date Mon, 06 Apr 2009 22:18:01 GMT
I am running Hadoop streaming. After around 42 jobs on an 18-node  
cluster, the jobtracker stops responding. This happens on normally- 
working code. Here are the symptoms.

1. A job is running, but it pauses with reduce stuck at XX%
2. "hadoop job -list" hangs or takes a very long time to return
3. In the Ganglia metrics on the Jobtracker node:
      a. jvm.metrics__JobTracker__gcTimeMillis rises above 20 k (20  
seconds) before failure
      b. jvm.metrics__JobTracker__memHeapUsedM rises above 600 before  
      c. jvm.metrics__JobTracker__gcCount rises above 1 k before failure

The ticker looks like this.

09/04/06 03:06:28 INFO streaming.StreamJob:  map 24%  reduce 7%
09/04/06 03:13:44 INFO streaming.StreamJob:  map 25%  reduce 7%
After the 03:13:44 line, it hangs for more than 15 minutes.

In the jobtracker log, I see this.

2009-04-04 04:19:13,563 WARN org.apache.hadoop.hdfs.DFSClient: Error  
Recovery for block blk_-8143535428142072268_95993 failed  because  
recovery from primary datanode failed 4 times. Will  

After restarting both dfs and mapreduce on all nodes, the problem  
goes away, and the formally non-working job proceeds without failure.

Does anyone else see this problem?

David Kellogg

View raw message