hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HADOOP-3120) Large #of tasks failing at one time can effectively hang the jobtracker
Date Thu, 17 Jul 2014 21:52:04 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Allen Wittenauer resolved HADOOP-3120.

    Resolution: Incomplete

I'm going to close this as stale. 

> Large #of tasks failing at one time can effectively hang the jobtracker 
> ------------------------------------------------------------------------
>                 Key: HADOOP-3120
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3120
>             Project: Hadoop Common
>          Issue Type: Bug
>         Environment: Linux/Hadoop-15.3
>            Reporter: Pete Wyckoff
>            Priority: Minor
> We think that JobTracker.removeMarkedTaks does so much logging when this happens (ie
logging thousands of failed taks per cycle) that nothing else can go on (since it's called
from a synchronized method) and thus by the next cycle, the next waves of jobs have failed
and we again have 10s of thousands of failures to log and on and on.
> At least, the above is what we observed - just a continual printing of those failures
and nothing else happening on and on. Of course the original jobs may have ultimately failed
but new jobs come in to perpetuate the problem.
> This has happened to us a number of times and since we commented out the log.info in
that method we haven't had any problems. Although thousands and thousands of task failures
are hopefully not that common.

This message was sent by Atlassian JIRA

View raw message