hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Wyckoff (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-3120) Large #of tasks failing at one time can effectively hang the jobtracker
Date Fri, 28 Mar 2008 18:16:24 GMT
Large #of tasks failing at one time can effectively hang the jobtracker 

                 Key: HADOOP-3120
                 URL: https://issues.apache.org/jira/browse/HADOOP-3120
             Project: Hadoop Core
          Issue Type: Bug
         Environment: Linux/Hadoop-15.3
            Reporter: Pete Wyckoff
            Priority: Minor

We think that JobTracker.removeMarkedTaks does so much logging when this happens (ie logging
thousands of failed taks per cycle) that nothing else can go on (since it's called from a
synchronized method) and thus by the next cycle, the next waves of jobs have failed and we
again have 10s of thousands of failures to log and on and on.

At least, the above is what we observed - just a continual printing of those failures and
nothing else happening on and on. Of course the original jobs may have ultimately failed but
new jobs come in to perpetuate the problem.

This has happened to us a number of times and since we commented out the log.info in that
method we haven't had any problems. Although thousands and thousands of task failures are
hopefully not that common.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message