Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A94C36169 for ; Tue, 2 Aug 2011 13:01:55 +0000 (UTC) Received: (qmail 18970 invoked by uid 500); 2 Aug 2011 13:01:55 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 18352 invoked by uid 500); 2 Aug 2011 13:01:54 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 18344 invoked by uid 99); 2 Aug 2011 13:01:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Aug 2011 13:01:53 +0000 X-ASF-Spam-Status: No, hits=-2000.7 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Aug 2011 13:01:50 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id BC3C79AC89 for ; Tue, 2 Aug 2011 13:01:28 +0000 (UTC) Date: Tue, 2 Aug 2011 13:01:28 +0000 (UTC) From: "Subroto Sanyal (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <1078938622.1315.1312290088767.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <15087182.90791291367857187.JavaMail.jira@thor> Subject: [jira] [Updated] (MAPREDUCE-2209) TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAPREDUCE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subroto Sanyal updated MAPREDUCE-2209: -------------------------------------- Attachment: MAPREDUCE-2209.patch > TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS > ------------------------------------------------------------------------------------- > > Key: MAPREDUCE-2209 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2209 > Project: Hadoop Map/Reduce > Issue Type: Bug > Environment: hadoop version: 0.19.1 > Reporter: Liyin Liang > Priority: Blocker > Attachments: 2209-1.diff, MAPREDUCE-2209.patch > > > If a job's jar file is very large, e.g 200m+, the TaskTracker's heartbeat hang for several minutes when localizing the job. The jstack of related threads are as follows: > {code:borderStyle=solid} > "TaskLauncher for task" daemon prio=10 tid=0x0000002b05ee5000 nid=0x1adf runnable [0x0000000042e56000] > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) > - locked <0x0000002afc892ec8> (a sun.nio.ch.Util$1) > - locked <0x0000002afc892eb0> (a java.util.Collections$UnmodifiableSet) > - locked <0x0000002afc8927d8> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) > at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260) > at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > - locked <0x0000002afce26158> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readShort(DataInputStream.java:295) > at org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1304) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1556) > - locked <0x0000002afce26218> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream) > at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673) > - locked <0x0000002afce26218> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream) > at java.io.DataInputStream.read(DataInputStream.java:83) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142) > at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1214) > at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1195) > at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:824) > - locked <0x0000002afce2d260> (a org.apache.hadoop.mapred.TaskTracker$RunningJob) > at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1745) > at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:103) > at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1710) > "Map-events fetcher for all reduce tasks on tracker_r01a08025:localhost/127.0.0.1:50050" daemon prio=10 tid=0x0000002b05ef8000 > nid=0x1ada waiting for monitor entry [0x0000000042d55000] > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:582) > - waiting to lock <0x0000002afce2d260> (a org.apache.hadoop.mapred.TaskTracker$RunningJob) > at org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:617) > - locked <0x0000002a9eefe1f8> (a java.util.TreeMap) > "IPC Server handler 2 on 50050" daemon prio=10 tid=0x0000002b050eb000 nid=0x1ab0 waiting for monitor entry [0x000000004234b000] > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:2684) > - waiting to lock <0x0000002a9eefe1f8> (a java.util.TreeMap) > - locked <0x0000002a9eac1de8> (a org.apache.hadoop.mapred.TaskTracker) > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) > "main" prio=10 tid=0x0000000040113800 nid=0x197d waiting for monitor entry [0x000000004022a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1196) > - waiting to lock <0x0000002a9eac1de8> (a org.apache.hadoop.mapred.TaskTracker) > at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1068) > at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1799) > at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2898) > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira