hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4208) The job is hanging up but never continuing until you kill the child process
Date Fri, 04 May 2012 15:26:57 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268447#comment-13268447
] 

Robert Joseph Evans commented on MAPREDUCE-4208:
------------------------------------------------

There are two types of timeouts that happen for a task.  The first is a user configurable
timeout with a default of 10 min.  If a task does not make any progress for 10 min then the
task will timeout.  Progress is defined as reading a record of input, writing a record of
output, or explicitly in your map/reduce code calling a method in the Context to indicate
that you are still making progress. 

The other timeout is that if your task does not heartbeat into the AM for 5 min the task is
assumed to be dead.

In your case you are still making progress, it is just very slow, so neither of the timeouts
would take place.  You could possible set the progress interval to be something very small
and that would probably detect the problem.  We do not want to change the default value for
the progress interval because there are lots of jobs that may be doing significant amounts
of computation for each record, and we don't want to kill them off.

mapreduce.task.timeout is the timeout you would want to configure.
                
> The job is hanging up but never continuing until you kill the child process 
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4208
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>         Environment: Hadoop 0.20.203.0
> Hbase 0.90.3
> Hive 0.80.1
>            Reporter: ccw
>
> I use the hive MR query on hbase,but the job is never end.
> The job is hanging but never continuing util you kill the child process 
> 2012-04-28 18:22:33,661 Stage-1 map = 0%,  reduce = 0%
> 2012-04-28 18:22:59,760 Stage-1 map = 25%,  reduce = 0%
> 2012-04-28 18:23:04,782 Stage-1 map = 38%,  reduce = 0%
> 2012-04-28 18:23:07,796 Stage-1 map = 50%,  reduce = 0%
> 2012-04-28 18:23:08,801 Stage-1 map = 50%,  reduce = 8%
> 2012-04-28 18:23:17,839 Stage-1 map = 50%,  reduce = 17%
> 2012-04-28 18:23:19,848 Stage-1 map = 63%,  reduce = 17%
> 2012-04-28 18:23:32,909 Stage-1 map = 63%,  reduce = 21%
> 2012-04-28 18:23:57,017 Stage-1 map = 75%,  reduce = 21%
> 2012-04-28 18:24:09,075 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:25:09,397 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:26:09,688 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:27:09,980 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:28:10,262 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:29:10,522 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:30:10,742 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:31:10,985 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:32:11,238 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:33:11,467 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:34:11,731 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:35:11,968 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:36:12,213 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:37:12,508 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:38:12,747 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:39:12,970 Stage-1 map = 75%,  reduce = 25%
> 2012-04-28 18:40:13,205 Stage-1 map = 75%,  reduce = 25%
> I checked the TT log,
> 2012-04-28 18:31:53,879 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:31:56,883 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:31:59,887 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:02,892 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:05,897 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:08,902 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:11,906 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:14,910 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:17,915 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:20,920 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:23,924 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:26,929 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:29,934 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:32,938 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:35,943 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:38,948 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:41,953 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:44,957 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:47,961 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:50,966 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:53,970 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:56,974 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:32:59,979 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:02,983 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:05,987 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:08,992 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:11,997 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:15,001 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:18,006 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:21,011 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:24,015 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:27,020 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:30,025 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:33,029 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:36,034 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:39,038 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:42,043 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:45,047 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:48,051 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:51,057 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> 2012-04-28 18:33:54,062 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_000002_0
0.0%
> [hadoop@mem1 logs]$ jps
> 3282 Child
> 31547 QuorumPeerMain
> 1840 TaskTracker
> 3469 Jps
> 31070 HRegionServer
> 30120 DataNode
> [hadoop@mem1 logs]$  kill 3282
> When I kill the child process , then the job continue and complete.
> 2012-04-28 18:40:51,324 Stage-1 map = 88%,  reduce = 25%
> 2012-04-28 18:41:04,364 Stage-1 map = 88%,  reduce = 29%
> 2012-04-28 18:41:31,448 Stage-1 map = 100%,  reduce = 29%
> 2012-04-28 18:41:43,485 Stage-1 map = 100%,  reduce = 100%

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message