hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "schubert zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5367) After some jobs have finished, Reducer will run new job's reduce tasks sequentially and not in parallel (mapred.JobTracker: Serious problem. While updating status, cannot find taskid...)
Date Tue, 17 Mar 2009 15:28:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682698#action_12682698
] 

schubert zhang commented on HADOOP-5367:
----------------------------------------

I also meet this such issue.

After a long time running of MapReduce (about 200 jobs have completed).  The MapReduce job
is huaguped forever.
(1) The JobTracker always logs:
2009-03-17 16:29:39,997 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While
updating status, cannot find taskid attempt_200903171247_0387_m_000015_1

(2) The Job cannot complete and stopp at 79% forever.

(3) All TaskTrackers may hungup at the sametime, since the logs of each TaskTracer stop at
that time.


And nefore the hangup. I can also find odds and ends such logs of JobTracker, such as.
2009-03-17 16:29:21,767 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While
updating status, cannot find taskid attempt_200903171247_0387_m_000015_1


And another experience is:
One time, I found the task slot cannot reach the capability, maybe some slot is also hungup.

> After some jobs have finished, Reducer will run new job's reduce tasks sequentially and
not in parallel (mapred.JobTracker: Serious problem.  While updating status, cannot find taskid...)
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5367
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5367
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: State: RUNNING
> Started: Fri Feb 27 17:00:07 CET 2009
> Version: 0.19.1, r745977
> Compiled: Fri Feb 20 00:16:34 UTC 2009 by ndaley
>            Reporter: Thibaut
>            Priority: Critical
>
> Hi,
> After I while, my cluster will only run the reduce tasks sequentially (each reducer running
on the same node), the other nodes stay empty. The map phase however will run the jobs on
all the nodes, also after such a "long" reduce phase has completed. But the reduce phase will
then be again executed sequentially. This happens in my cluster after about 160 successfully
completed jobs. (Some jobs have reducer set to 0!). 
> As possible solution I have to restart the mapreduce service.
> I didn't notice this behaviour in version 0.19.0. I can't use version 0.19.0 because
of the multipleoutput bug when setting reducers to 0.
> Anoter site node which might be related. I also tried running the jobs with speculative
execution set to on. My cluster would always hold back one reducer and only run it (in multiple
instances) after the first of the other 6 reducers had finished, instead of launching all
of them at the same time.
> Below is a short extract from related logfile. It's full of these kind of entries.
> 09/02/28 12:48:07 INFO mapred.JobTracker: Serious problem.  While updating status, cannot
find taskid attempt_200902271700_0051_r_000006_1
> 09/02/28 12:48:08 INFO mapred.JobTracker: Serious problem.  While updating status, cannot
find taskid attempt_200902271700_0041_r_000002_1
> 09/02/28 12:48:08 INFO mapred.JobTracker: Serious problem.  While updating status, cannot
find taskid attempt_200902271700_0083_r_000006_1
> 09/02/28 12:48:08 INFO mapred.JobTracker: Serious problem.  While updating status, cannot
find taskid attempt_200902271700_0041_r_000005_1
> 09/02/28 12:48:10 INFO mapred.JobTracker: Serious problem.  While updating status, cannot
find taskid attempt_200902271700_0105_r_000006_1
> 09/02/28 12:48:10 INFO mapred.JobTracker: Serious problem.  While updating status, cannot
find taskid attempt_200902271700_0102_r_000006_1
> 09/02/28 12:48:12 INFO mapred.JobTracker: Serious problem.  While updating status, cannot
find taskid attempt_200902271700_0051_r_000006_1
> 09/02/28 12:48:13 INFO mapred.JobTracker: Serious problem.  While updating status, cannot
find taskid attempt_200902271700_0041_r_000002_1
> 09/02/28 12:48:13 INFO mapred.JobTracker: Serious problem.  While updating status, cannot
find taskid attempt_200902271700_0083_r_000006_1
> 09/02/28 12:48:13 INFO mapred.JobTracker: Serious problem.  While updating status, cannot
find taskid attempt_200902271700_0041_r_000005_1

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message