hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Kunz (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-3813) RPC queue overload of JobTracker
Date Tue, 22 Jul 2008 23:07:33 GMT
RPC queue overload of JobTracker
--------------------------------

                 Key: HADOOP-3813
                 URL: https://issues.apache.org/jira/browse/HADOOP-3813
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.17.1
            Reporter: Christian Kunz


On a cluster with about 1700 nodes, when a job with about 100,000 maps and 10,000 reduces
completed, the JobTracker, even with 80 handlers, could not handle the rpc call load during
promotion of the job, such that at the end, because of the discarded heartbeats, the JobTracker
lost nearly all TaskTrackers (about 10 TaskTrackers left). Promotion took more than 40 minutes.
They reconnected and everything recovered, but this might have been just luck.
Shouldn't there be an adaptive throttling of the rate in heartbeats and TaskCompletionEvents?

Sample messsages:
2008-07-22 18:21:55,831 WARN org.apache.hadoop.ipc.Server: Call queue overflow discarding
oldest call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@115f6b6, false, true, 18137)
from xxx
2008-07-22 18:21:55,834WARN org.apache.hadoop.ipc.Server: Call queue overflow discarding oldest
call getTaskCompletionEvents(job_200807190635_0012, 119567, 50) from yyy
...
2008-07-22 19:02:28,821 WARN org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9020, call
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@19d32fa, false, true, 18199) from zzz:
discarded for being too old (40936)
2008-07-22 19:02:28,821 WARN org.apache.hadoop.ipc.Server: IPC Server handler 34 on 9020,
call getTaskCompletionEvents(job_200807190635_0012, 119567, 50) from uuu: discarded for being
too old (40978)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message