Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 80192 invoked from network); 5 Aug 2009 08:34:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Aug 2009 08:34:50 -0000 Received: (qmail 28468 invoked by uid 500); 5 Aug 2009 08:34:55 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 28378 invoked by uid 500); 5 Aug 2009 08:34:55 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 28368 invoked by uid 99); 5 Aug 2009 08:34:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Aug 2009 08:34:55 +0000 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=NO_RDNS_DOTCOM_HELO,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.147.107.21] (HELO mrout2-b.corp.re1.yahoo.com) (69.147.107.21) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Aug 2009 08:34:41 +0000 Received: from EGL-EX07CAS02.ds.corp.yahoo.com (egl-ex07cas02.eglbp.corp.yahoo.com [203.83.248.209]) by mrout2-b.corp.re1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id n758XLw4040407; Wed, 5 Aug 2009 01:33:22 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:from:to:date:subject:thread-topic:thread-index: message-id:references:in-reply-to:accept-language: content-language:x-ms-has-attach:x-ms-tnef-correlator:acceptlanguage: content-type:content-transfer-encoding:mime-version; b=SzG8EYJiuygGeznjzpMc0Th7I9ynl8b0sMtgK36labEh1O2DF1Cxc+aSwgpSdxdC Received: from EGL-EX07VS01.ds.corp.yahoo.com ([203.83.248.205]) by EGL-EX07CAS02.ds.corp.yahoo.com ([203.83.248.216]) with mapi; Wed, 5 Aug 2009 14:03:20 +0530 From: Amogh Vasekar To: "common-user@hadoop.apache.org" , "mathias.demare@gmail.com" Date: Wed, 5 Aug 2009 14:02:22 +0530 Subject: RE: Some tasks fail to report status between the end of the map and the beginning of the merge Thread-Topic: Some tasks fail to report status between the end of the map and the beginning of the merge Thread-Index: AcoVmw3YWfSSJn9ER/iVesI/vz89egAC7FWA Message-ID: <616DA47B2EF5B944B91846785B512FF4BD3663BEFC@EGL-EX07VS01.ds.corp.yahoo.com> References: <375c60f40908050002n3a110d66ua2672a40bb8ad866@mail.gmail.com> In-Reply-To: <375c60f40908050002n3a110d66ua2672a40bb8ad866@mail.gmail.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org 10 mins reminds me of parameter mapred.task.timeout . This is configurable.= Or alternatively you might just do a sysout to let tracker know of its exi= stence ( not an ideal solution though ) Thanks, Amogh -----Original Message----- From: Mathias De Mar=E9 [mailto:mathias.demare@gmail.com] Sent: Wednesday, August 05, 2009 12:33 PM To: common-user@hadoop.apache.org Subject: Some tasks fail to report status between the end of the map and th= e beginning of the merge Hi, I'm having some problems (Hadoop 0.20.0) where map tasks fail to report status for 10 minutes and get killed eventually. All of the tasks output around the same amount of data, some only take a few seconds before startin= g the 'merge' on the segments, but some seem to fail by just stopping to work for about 10 minutes. Several of these failed tasks eventually do succeed, on their 2nd or 3rd task attempt. It's nice to see them succeed eventually, but each of those tasks crawls a few thousand websites, and it seems like a terrible waste to let them retry a few times, in the meantime downloading all of those websites again, after 10 minutes of doing nothing. Even more annoyingly, eventually, one of the tasks fails completely, which then kills the entire job. I could probably increase the amount of task attempts and simply hope the tasks will succeed eventually, but that doesn't solve the huge slowdowns an= d the recrawling required. Here's an example of a successful task attempt (this is attempt 3 of a specific task -- note that it takes around 8 seconds between spill 133 and spill 134): 2009-08-04 18:38:48,059 INFO org.apache.hadoop.mapred.MapTask: Finished spill 129 2009-08-04 18:39:00,626 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full =3D true 2009-08-04 18:39:00,626 INFO org.apache.hadoop.mapred.MapTask: bufstart =3D 2492163; bufend =3D 628913; bufvoid =3D 2988446 2009-08-04 18:39:00,626 INFO org.apache.hadoop.mapred.MapTask: kvstart =3D 9727; kvend =3D 7760; length =3D 9830 2009-08-04 18:39:01,467 INFO org.apache.hadoop.mapred.MapTask: Finished spill 130 2009-08-04 18:39:08,136 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full =3D true 2009-08-04 18:39:08,136 INFO org.apache.hadoop.mapred.MapTask: bufstart =3D 628913; bufend =3D 1880222; bufvoid =3D 2988448 2009-08-04 18:39:08,136 INFO org.apache.hadoop.mapred.MapTask: kvstart =3D 7760; kvend =3D 5793; length =3D 9830 2009-08-04 18:39:08,463 INFO org.apache.hadoop.mapred.MapTask: Finished spill 131 2009-08-04 18:39:12,456 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full =3D true 2009-08-04 18:39:12,459 INFO org.apache.hadoop.mapred.MapTask: bufstart =3D 1880222; bufend =3D 136018; bufvoid =3D 2988448 2009-08-04 18:39:12,459 INFO org.apache.hadoop.mapred.MapTask: kvstart =3D 5793; kvend =3D 3826; length =3D 9830 2009-08-04 18:39:12,697 INFO org.apache.hadoop.mapred.MapTask: Finished spill 132 2009-08-04 18:39:23,138 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full =3D true 2009-08-04 18:39:23,138 INFO org.apache.hadoop.mapred.MapTask: bufstart =3D 136018; bufend =3D 1347353; bufvoid =3D 2988448 2009-08-04 18:39:23,138 INFO org.apache.hadoop.mapred.MapTask: kvstart =3D 3826; kvend =3D 1859; length =3D 9830 2009-08-04 18:39:25,747 INFO org.apache.hadoop.mapred.MapTask: Finished spill 133 2009-08-04 18:47:49,823 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output 2009-08-04 18:47:50,132 INFO org.apache.hadoop.mapred.MapTask: Finished spill 134 2009-08-04 18:47:50,525 INFO org.apache.hadoop.mapred.Merger: Merging 135 sorted segments 2009-08-04 18:47:50,528 INFO org.apache.hadoop.mapred.Merger: Merging 9 intermediate segments out of a total of 135 2009-08-04 18:47:52,224 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 127 2009-08-04 18:47:53,837 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 118 2009-08-04 18:47:55,417 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 109 2009-08-04 18:47:56,990 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 100 2009-08-04 18:47:58,492 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 91 2009-08-04 18:48:00,191 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 82 2009-08-04 18:48:02,315 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 73 2009-08-04 18:48:04,184 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 64 2009-08-04 18:48:06,162 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 55 2009-08-04 18:48:08,149 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 46 2009-08-04 18:48:09,888 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 37 2009-08-04 18:48:11,744 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 28 2009-08-04 18:48:13,544 INFO org.apache.hadoop.mapred.Merger: Merging 10 intermediate segments out of a total of 19 2009-08-04 18:48:21,177 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 segments left of total size: 153426836 bytes 2009-08-04 18:48:46,226 INFO org.apache.hadoop.mapred.TaskRunner: Task:attempt_200908041626_0001_m_000133_2 is done. And is in the process of commiting 2009-08-04 18:48:46,373 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_200908041626_0001_m_000133_2' done. And here the same task, but an earlier attempt (it only goes up to spill 132, but those do vary slightly in size, right?): 2009-08-04 17:15:49,112 INFO org.apache.hadoop.mapred.MapTask: bufstart =3D 321905; bufend =3D 1464691; bufvoid =3D 2988448 2009-08-04 17:15:49,112 INFO org.apache.hadoop.mapred.MapTask: kvstart =3D 7764; kvend =3D 5797; length =3D 9830 2009-08-04 17:15:49,910 INFO org.apache.hadoop.mapred.MapTask: Finished spill 126 2009-08-04 17:16:00,785 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full =3D true 2009-08-04 17:16:00,785 INFO org.apache.hadoop.mapred.MapTask: bufstart =3D 1464691; bufend =3D 2580856; bufvoid =3D 2988448 2009-08-04 17:16:00,785 INFO org.apache.hadoop.mapred.MapTask: kvstart =3D 5797; kvend =3D 3830; length =3D 9830 2009-08-04 17:16:05,728 INFO org.apache.hadoop.mapred.MapTask: Finished spill 127 2009-08-04 17:16:20,883 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full =3D true 2009-08-04 17:16:20,883 INFO org.apache.hadoop.mapred.MapTask: bufstart =3D 2580856; bufend =3D 1000978; bufvoid =3D 2988448 2009-08-04 17:16:20,883 INFO org.apache.hadoop.mapred.MapTask: kvstart =3D 3830; kvend =3D 1863; length =3D 9830 2009-08-04 17:16:21,709 INFO org.apache.hadoop.mapred.MapTask: Finished spill 128 2009-08-04 17:16:24,461 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full =3D true 2009-08-04 17:16:24,461 INFO org.apache.hadoop.mapred.MapTask: bufstart =3D 1000978; bufend =3D 2246166; bufvoid =3D 2988448 2009-08-04 17:16:24,461 INFO org.apache.hadoop.mapred.MapTask: kvstart =3D 1863; kvend =3D 9727; length =3D 9830 2009-08-04 17:16:24,934 INFO org.apache.hadoop.mapred.MapTask: Finished spill 129 2009-08-04 17:16:40,853 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full =3D true 2009-08-04 17:16:40,853 INFO org.apache.hadoop.mapred.MapTask: bufstart =3D 2246166; bufend =3D 374750; bufvoid =3D 2988435 2009-08-04 17:16:40,853 INFO org.apache.hadoop.mapred.MapTask: kvstart =3D 9727; kvend =3D 7760; length =3D 9830 2009-08-04 17:16:43,740 INFO org.apache.hadoop.mapred.MapTask: Finished spill 130 2009-08-04 17:16:49,141 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full =3D true 2009-08-04 17:16:49,141 INFO org.apache.hadoop.mapred.MapTask: bufstart =3D 374750; bufend =3D 1653302; bufvoid =3D 2988448 2009-08-04 17:16:49,141 INFO org.apache.hadoop.mapred.MapTask: kvstart =3D 7760; kvend =3D 5793; length =3D 9830 2009-08-04 17:16:49,868 INFO org.apache.hadoop.mapred.MapTask: Finished spill 131 2009-08-04 17:17:04,902 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full =3D true 2009-08-04 17:17:04,902 INFO org.apache.hadoop.mapred.MapTask: bufstart =3D 1653302; bufend =3D 2882491; bufvoid =3D 2988448 2009-08-04 17:17:04,902 INFO org.apache.hadoop.mapred.MapTask: kvstart =3D 5793; kvend =3D 3826; length =3D 9830 2009-08-04 17:17:05,355 INFO org.apache.hadoop.mapred.MapTask: Finished spill 132 There's no more output after that last spill, since the task eventually got killed after 10 minutes of nothing. And in case it can help, here's a thread dump for one of those failed tasks= : 2009-08-04 18:53:23,460 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200908041626_0001_m_000135_3: Task attempt_200908041626_0001_m_000135_3 failed to report status for 602 seconds. Killing! 2009-08-04 18:53:23,470 INFO org.apache.hadoop.mapred.TaskTracker: Process Thread Dump: lost task 31 active threads Thread 1673 (process reaper): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: java.lang.UNIXProcess.waitForProcessExit(Native Method) java.lang.UNIXProcess.access$900(UNIXProcess.java:20) java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132) Thread 1672 (JVM Runner jvm_200908041626_0001_m_1938204689 spawned.): State: WAITING Blocked count: 1 Waited count: 2 Waiting on java.lang.UNIXProcess@155d434 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) java.lang.UNIXProcess.waitFor(UNIXProcess.java:165) org.apache.hadoop.util.Shell.runCommand(Shell.java:186) org.apache.hadoop.util.Shell.run(Shell.java:134) org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(Jv= mManager.java:335) org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmMana= ger.java:324) Thread 1671 (Thread-1040): State: WAITING Blocked count: 0 Waited count: 1 Waiting on java.lang.Object@1ba4552 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409) Thread 1646 (process reaper): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: java.lang.UNIXProcess.waitForProcessExit(Native Method) java.lang.UNIXProcess.access$900(UNIXProcess.java:20) java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132) Thread 1645 (JVM Runner jvm_200908041626_0001_m_2100332869 spawned.): State: WAITING Blocked count: 1 Waited count: 2 Waiting on java.lang.UNIXProcess@5e34ce Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) java.lang.UNIXProcess.waitFor(UNIXProcess.java:165) org.apache.hadoop.util.Shell.runCommand(Shell.java:186) org.apache.hadoop.util.Shell.run(Shell.java:134) org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(Jv= mManager.java:335) org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmMana= ger.java:324) Thread 1644 (Thread-1023): State: WAITING Blocked count: 0 Waited count: 1 Waiting on java.lang.Object@1bee6f Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409) Thread 143 (process reaper): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: java.lang.UNIXProcess.waitForProcessExit(Native Method) java.lang.UNIXProcess.access$900(UNIXProcess.java:20) java.lang.UNIXProcess$1$1.run(UNIXProcess.java:132) Thread 141 (JVM Runner jvm_200908041626_0001_r_-836397802 spawned.): State: WAITING Blocked count: 1 Waited count: 2 Waiting on java.lang.UNIXProcess@11fceed Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) java.lang.UNIXProcess.waitFor(UNIXProcess.java:165) org.apache.hadoop.util.Shell.runCommand(Shell.java:186) org.apache.hadoop.util.Shell.run(Shell.java:134) org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286) org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(Jv= mManager.java:335) org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmMana= ger.java:324) Thread 140 (Thread-89): State: WAITING Blocked count: 0 Waited count: 1 Waiting on java.lang.Object@1c63791 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:409) Thread 33 (IPC Client (47) connection to localhost/127.0.0.1:9001 from root): State: TIMED_WAITING Blocked count: 5563 Waited count: 5552 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:403) org.apache.hadoop.ipc.Client$Connection.run(Client.java:445) Thread 32 (Directory/File cleanup thread): State: WAITING Blocked count: 0 Waited count: 165 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@fdffb= 1 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await= (AbstractQueuedSynchronizer.java:1925) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) org.apache.hadoop.mapred.CleanupQueue$PathCleanupThread.run(CleanupQueue.ja= va:89) Thread 8 (taskCleanup): State: WAITING Blocked count: 0 Waited count: 2 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@19c20= 5b Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await= (AbstractQueuedSynchronizer.java:1925) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:352) java.lang.Thread.run(Thread.java:619) Thread 31 (TaskLauncher for task): State: WAITING Blocked count: 4 Waited count: 5 Waiting on java.util.LinkedList@1d9d565 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1662= ) Thread 30 (TaskLauncher for task): State: WAITING Blocked count: 783 Waited count: 781 Waiting on java.util.LinkedList@1fbebee Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1662= ) Thread 29 (Map-events fetcher for all reduce tasks on tracker_ip-10-244-145-237.ec2.internal:localhost/127.0.0.1:40135): State: TIMED_WAITING Blocked count: 10965 Waited count: 10957 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker= .java:671) Thread 27 (IPC Server handler 7 on 40135): State: WAITING Blocked count: 5 Waited count: 2565 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da6= 90 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await= (AbstractQueuedSynchronizer.java:1925) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) Thread 26 (IPC Server handler 6 on 40135): State: WAITING Blocked count: 11 Waited count: 2565 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da6= 90 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await= (AbstractQueuedSynchronizer.java:1925) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) Thread 25 (IPC Server handler 5 on 40135): State: WAITING Blocked count: 11 Waited count: 2566 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da6= 90 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await= (AbstractQueuedSynchronizer.java:1925) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) Thread 24 (IPC Server handler 4 on 40135): State: WAITING Blocked count: 7 Waited count: 2565 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da6= 90 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await= (AbstractQueuedSynchronizer.java:1925) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) Thread 23 (IPC Server handler 3 on 40135): State: WAITING Blocked count: 10 Waited count: 2565 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da6= 90 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await= (AbstractQueuedSynchronizer.java:1925) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) Thread 22 (IPC Server handler 2 on 40135): State: WAITING Blocked count: 11 Waited count: 2566 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da6= 90 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await= (AbstractQueuedSynchronizer.java:1925) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) Thread 21 (IPC Server handler 1 on 40135): State: WAITING Blocked count: 15 Waited count: 2567 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da6= 90 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await= (AbstractQueuedSynchronizer.java:1925) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) Thread 20 (IPC Server handler 0 on 40135): State: WAITING Blocked count: 12 Waited count: 2565 Waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@17da6= 90 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await= (AbstractQueuedSynchronizer.java:1925) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) org.apache.hadoop.ipc.Server$Handler.run(Server.java:939) Thread 17 (IPC Server listener on 40135): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84) org.apache.hadoop.ipc.Server$Listener.run(Server.java:318) Thread 19 (IPC Server Responder): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.apache.hadoop.ipc.Server$Responder.run(Server.java:478) Thread 16 (Timer-0): State: TIMED_WAITING Blocked count: 1 Waited count: 298 Stack: java.lang.Object.wait(Native Method) java.util.TimerThread.mainLoop(Timer.java:509) java.util.TimerThread.run(Timer.java:462) Thread 15 (11502424@qtp0-0 - Acceptor0 SelectChannelConnector@0.0.0.0:50060 ): State: RUNNABLE Blocked count: 50 Waited count: 1 Stack: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:= 429) org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:185) org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.= java:124) org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:707= ) org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:52= 2) Thread 4 (Signal Dispatcher): State: RUNNABLE Blocked count: 0 Waited count: 0 Stack: Thread 3 (Finalizer): State: WAITING Blocked count: 424 Waited count: 425 Waiting on java.lang.ref.ReferenceQueue$Lock@7b6d1c Stack: java.lang.Object.wait(Native Method) java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) Thread 2 (Reference Handler): State: WAITING Blocked count: 445 Waited count: 446 Waiting on java.lang.ref.Reference$Lock@16ee240 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) Thread 1 (main): State: RUNNABLE Blocked count: 2973 Waited count: 5789 Stack: sun.management.ThreadImpl.getThreadInfo0(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:147) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:123) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java= :149) org.apache.hadoop.util.ReflectionUtils.logThreadInfo(ReflectionUtils.java:2= 03) org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks(TaskTracker.java= :1383) org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1129= ) org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1779) org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2881) 2009-08-04 18:53:23,471 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_200908041626_0001_m_000135_3 2009-08-04 18:53:23,471 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 3 2009-08-04 18:53:23,471 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200908041626_0001_m_000135_3 done; removing files. 2009-08-04 18:53:23,550 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt_200908041626_0001_m_000135_3 not found in cache Thanks in advance for any response! I'm afraid I'm somewhat clueless about this. Mathias