hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: Reduce task progress at 100% but status is still RUNNING
Date Sat, 18 Dec 2010 14:00:51 GMT
Are you doing much in your reducer? Try reporting a string back at
various progress points of your reducer functionality (like a status
update).

A task would time-out if it has not {written anything / read anything
/ reported anything about its status}.

On Sat, Dec 18, 2010 at 7:24 PM, Ivan Leonardi <ivanleonardi@gmail.com> wrote:
> Hi all,
> I'm running Hadoop 0.20.2 in a cluster setup built by 4 machines, and
> I'm trying to run my own job. Here is my configuration:
>
> core-site.xml
> <configuration>
> <property>
> <name>fs.default.name</name><value>hdfs://net-server00:44310</value>
> </property>
> <property>
> <name>hadoop.tmp.dir</name><value>/tmp/hadoop-leonardi</value>
> </property>
> </configuration>
>
> hdfs-site.xml
> <configuration>
> <property>
> <name>dfs.replication</name><value>1</value>
> </property>
> </configuration>
>
> mapred-site.xml
> <configuration>
> <property>
> <name>mapred.job.tracker</name><value>net-server00:44311</value>
> </property>
> <property>
> <name>mapred.tasktracker.map.tasks.maximum</name><value>3</value>
> </property>
> <property>
> <name>mapred.tasktracker.reduce.tasks.maximum</name><value>3</value>
> </property>
> <property>
> <name>tasktracker.http.threads</name><value>60</value>
> </property>
> <property>
> <name>mapred.reduce.parallel.copies</name><value>50</value>
> </property>
> </configuration>
>
> My job runs normally at the beginning:
> 10/12/18 14:16:36 INFO input.FileInputFormat: Total input paths to process : 30
> 10/12/18 14:16:36 INFO mapred.JobClient: Running job: job_201012181359_0004
> 10/12/18 14:16:37 INFO mapred.JobClient:  map 0% reduce 0%
> 10/12/18 14:16:51 INFO mapred.JobClient:  map 20% reduce 0%
> 10/12/18 14:16:54 INFO mapred.JobClient:  map 40% reduce 0%
> 10/12/18 14:17:00 INFO mapred.JobClient:  map 60% reduce 0%
> 10/12/18 14:17:03 INFO mapred.JobClient:  map 80% reduce 2%
> 10/12/18 14:17:06 INFO mapred.JobClient:  map 80% reduce 5%
> 10/12/18 14:17:09 INFO mapred.JobClient:  map 90% reduce 7%
> 10/12/18 14:17:12 INFO mapred.JobClient:  map 100% reduce 9%
> 10/12/18 14:17:15 INFO mapred.JobClient:  map 100% reduce 10%
> 10/12/18 14:17:18 INFO mapred.JobClient:  map 100% reduce 11%
> 10/12/18 14:17:19 INFO mapred.JobClient:  map 100% reduce 13%
> 10/12/18 14:17:21 INFO mapred.JobClient:  map 100% reduce 21%
> 10/12/18 14:17:22 INFO mapred.JobClient:  map 100% reduce 23%
> 10/12/18 14:17:24 INFO mapred.JobClient:  map 100% reduce 28%
> 10/12/18 14:17:27 INFO mapred.JobClient:  map 100% reduce 33%
> 10/12/18 14:17:29 INFO mapred.JobClient:  map 100% reduce 35%
> 10/12/18 14:17:31 INFO mapred.JobClient:  map 100% reduce 38%
> 10/12/18 14:17:34 INFO mapred.JobClient:  map 100% reduce 40%
>
> at this point, this percentages are blocked for 10 minutes (600 secs).
> In the meanwhile I looked at the Task Tracker Status, I see 3 running
> reduce tasks in each machine's proper web page having Status=RUNNING
> and Progress=100%; here is a sample:
> TaskAttempts                           Status    Progress
> attempt_201012181359_0004_r_000001_0   RUNNING   100.00%
> attempt_201012181359_0004_r_000003_0   RUNNING   100.00%
> attempt_201012181359_0004_r_000006_0   RUNNING   100.00%
>
> Therefore, after 600 secs, the console continues the logging by
> emitting the following:
> 10/12/18 14:27:34 INFO mapred.JobClient:  map 100% reduce 36%
> 10/12/18 14:27:36 INFO mapred.JobClient: Task Id :
> attempt_201012181359_0004_r_000006_0, Status : FAILED Task
> attempt_201012181359_0004_r_000006_0 failed to report status for 602
> seconds. Killing!
> 10/12/18 14:27:38 INFO mapred.JobClient:  map 100% reduce 23%
> 10/12/18 14:27:40 INFO mapred.JobClient:  map 100% reduce 6%
> 10/12/18 14:27:40 INFO mapred.JobClient: Task Id :
> attempt_201012181359_0004_r_000007_0, Status : FAILED
> Task attempt_201012181359_0004_r_000007_0 failed to report status for
> 601 seconds. Killing!
> 10/12/18 14:27:40 INFO mapred.JobClient: Task Id :
> attempt_201012181359_0004_r_000010_0, Status : FAILED
> Task attempt_201012181359_0004_r_000010_0 failed to report status for
> 601 seconds. Killing!
> 10/12/18 14:27:41 INFO mapred.JobClient:  map 100% reduce 0%
> 10/12/18 14:27:42 INFO mapred.JobClient: Task Id :
> attempt_201012181359_0004_r_000000_0, Status : FAILED
> Task attempt_201012181359_0004_r_000000_0 failed to report status for
> 600 seconds. Killing!
> 10/12/18 14:27:42 INFO mapred.JobClient: Task Id :
> attempt_201012181359_0004_r_000009_0, Status : FAILED
> Task attempt_201012181359_0004_r_000009_0 failed to report status for
> 600 seconds. Killing!
> 10/12/18 14:27:43 INFO mapred.JobClient: Task Id :
> attempt_201012181359_0004_r_000001_0, Status : FAILED
> Task attempt_201012181359_0004_r_000001_0 failed to report status for
> 601 seconds. Killing!
> 10/12/18 14:27:43 INFO mapred.JobClient: Task Id :
> attempt_201012181359_0004_r_000003_0, Status : FAILED
> Task attempt_201012181359_0004_r_000003_0 failed to report status for
> 602 seconds. Killing!
> 10/12/18 14:27:45 INFO mapred.JobClient: Task Id :
> attempt_201012181359_0004_r_000004_0, Status : FAILED
> Task attempt_201012181359_0004_r_000004_0 failed to report status for
> 601 seconds. Killing!
> 10/12/18 14:27:45 INFO mapred.JobClient: Task Id :
> attempt_201012181359_0004_r_000008_0, Status : FAILED
> Task attempt_201012181359_0004_r_000008_0 failed to report status for
> 601 seconds. Killing!
> 10/12/18 14:27:46 INFO mapred.JobClient:  map 100% reduce 1%
> 10/12/18 14:27:46 INFO mapred.JobClient: Task Id :
> attempt_201012181359_0004_r_000011_0, Status : FAILED
> Task attempt_201012181359_0004_r_000011_0 failed to report status for
> 601 seconds. Killing!
> 10/12/18 14:27:46 INFO mapred.JobClient: Task Id :
> attempt_201012181359_0004_r_000002_0, Status : FAILED
> Task attempt_201012181359_0004_r_000002_0 failed to report status for
> 602 seconds. Killing!
> 10/12/18 14:27:48 INFO mapred.JobClient: Task Id :
> attempt_201012181359_0004_r_000005_0, Status : FAILED
> Task attempt_201012181359_0004_r_000005_0 failed to report status for
> 602 seconds. Killing!
> 10/12/18 14:27:49 INFO mapred.JobClient:  map 100% reduce 2%
> 10/12/18 14:27:55 INFO mapred.JobClient:  map 100% reduce 4%
> 10/12/18 14:27:56 INFO mapred.JobClient:  map 100% reduce 6%
> 10/12/18 14:27:58 INFO mapred.JobClient:  map 100% reduce 9%
> 10/12/18 14:28:00 INFO mapred.JobClient:  map 100% reduce 14%
> 10/12/18 14:28:02 INFO mapred.JobClient:  map 100% reduce 22%
> 10/12/18 14:28:03 INFO mapred.JobClient:  map 100% reduce 25%
> 10/12/18 14:28:05 INFO mapred.JobClient:  map 100% reduce 31%
> 10/12/18 14:28:06 INFO mapred.JobClient:  map 100% reduce 33%
> 10/12/18 14:28:08 INFO mapred.JobClient:  map 100% reduce 36%
> 10/12/18 14:28:09 INFO mapred.JobClient:  map 100% reduce 37%
> 10/12/18 14:28:11 INFO mapred.JobClient:  map 100% reduce 38%
> 10/12/18 14:28:12 INFO mapred.JobClient:  map 100% reduce 39%
> 10/12/18 14:28:14 INFO mapred.JobClient:  map 100% reduce 40%
>
> and such behaviour continues.
>
> Is there anyone had the same problem and can help me?
> Thank you all!
>
> Ivan
>



-- 
Harsh J
www.harshj.com

Mime
View raw message