Return-Path: X-Original-To: apmail-hadoop-general-archive@minotaur.apache.org Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 517FCE57D for ; Fri, 4 Jan 2013 15:03:17 +0000 (UTC) Received: (qmail 69892 invoked by uid 500); 4 Jan 2013 15:03:15 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 69585 invoked by uid 500); 4 Jan 2013 15:03:15 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 69562 invoked by uid 99); 4 Jan 2013 15:03:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Jan 2013 15:03:14 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of royston.sellman@googlemail.com designates 74.125.82.182 as permitted sender) Received: from [74.125.82.182] (HELO mail-we0-f182.google.com) (74.125.82.182) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Jan 2013 15:03:08 +0000 Received: by mail-we0-f182.google.com with SMTP id u54so7954432wey.27 for ; Fri, 04 Jan 2013 07:02:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=gBUbBAMFUtggP0oKw45Qme7oiKA6ROsTSFLVcjOf+kA=; b=oTa8oMlCqmkDVQ7ZznAmYujrTwKVOE62ahBaseHBgq2EFO0vsbPaOh4fphyydwVe7Z ImCfv5InkiwzQXypSoIgQsgL/fM4+vOJlGKCLuvvDPDYsRh9cFqTue5TgGf4DQ3TcKvq r6lqiHMe/lOuDSBTpDdYSw/Q5jxfI+oe3zPnXag931qFTWg/tbMhaNf/pG6m0n31kKHl NkRPDvL9wo836EN8MhcucITm7Lod88keuoNc2VUmCoHq17d3otjZCp3oCmRWiWNw0jme U2/GcoFRYpaAxffuzEBdImIrraPrUi8dq6fFsod/4MW5CPIYAD4RmL34XRZiVcrC6BjN WYmQ== X-Received: by 10.194.85.234 with SMTP id k10mr84037616wjz.53.1357311766833; Fri, 04 Jan 2013 07:02:46 -0800 (PST) Received: from [192.168.1.34] (host86-175-1-239.wlms-broadband.com. [86.175.1.239]) by mx.google.com with ESMTPS id bd6sm63037052wib.10.2013.01.04.07.02.45 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 04 Jan 2013 07:02:46 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Lost tasktracker errors From: Royston Sellman In-Reply-To: Date: Fri, 4 Jan 2013 15:02:44 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: general@hadoop.apache.org X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org Hi Bobby, Thanks for the response Bobby, The tasktracker logs such as "hadoop-hdfs-tasktracker-hd-37-03.log" = contained the log messages included in our previous message. It seems to = show a series of successful map attempts with a few reduce attempts = interspersed, then it gets to a point and only shows a series of reduce = attempts that appear to be stuck at the same level of progress, before = outputting the 143 exit code and the interrupted sleep message at the = end. There is nothing in the tasktracker~.out files... The machines did not go down but the affected TTs did not log anything = till I got up in the morning, saw the job had frozen, did stop-all.sh. = Then the stalled TTs logged the shutdown. The disks are not full (67% usage across 12 disks per worker). It seems that the 143 exit code indicates that an external process has = terminated our tasktracker JVM. Is this correct? If so, what would the likely suspects be that would terminate our = tasktrackers? Is it possible this is related to our operating system = (Scientific Linux 6) and PAM limits? We had already increased our hard limit on the number of open files for = the "hdfs" user (that launches hdfs and mapred daemons) to 32768 in the = hope that this would solve the issue. Can you see anything wrong with = our security limits: [hdfs@hd-37-03 hdfs]$ ulimit -aH core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 191988 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Thanks for your help. Royston On 4 Jan 2013, at 14:34, Robert Evans wrote: > Is there anything in the task tracker's logs? Did the machines go = down? > Are there full disks on those nodes? >=20 > --Bobby >=20 > On 1/4/13 5:52 AM, "Royston Sellman" > wrote: >=20 >> I'm running a job over a 380 billion row 20 TB dataset which is = computing >> sum(), max() etc. The job is running fine at around 3 million rows = per >> second for several hours then grinding to a halt as it loses one = after >> another of the tasktrackers. We see a healthy mix of successful map = and >> reduce attempts on the tasktracker... >>=20 >>=20 >>=20 >> 2013-01-03 23:41:40,249 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_m_041109_0 1.0% >>=20 >> 2013-01-03 23:41:40,256 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_m_041105_0 1.0% >>=20 >> 2013-01-03 23:41:40,260 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_m_041105_0 1.0% >>=20 >> 2013-01-03 23:41:40,261 INFO org.apache.hadoop.mapred.TaskTracker: = Task >> attempt_201301031813_0001_m_041105_0 is done. >>=20 >> 2013-01-03 23:41:40,261 INFO org.apache.hadoop.mapred.TaskTracker: >> reported >> output size for attempt_201301031813_0001_m_041105_0 was 111 >>=20 >> 2013-01-03 23:41:40,261 INFO org.apache.hadoop.mapred.TaskTracker: >> addFreeSlot : current free slots : 8 >>=20 >> 2013-01-03 23:41:40,374 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_m_041106_0 0.9884119% >>=20 >> 2013-01-03 23:41:40,432 INFO org.apache.hadoop.mapred.JvmManager: JVM = : >> jvm_201301031813_0001_m_2021872807 exited with exit code 0. Number of >> tasks >> it ran: 1 >>=20 >> 2013-01-03 23:41:40,807 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_m_041103_0 0.9884134% >>=20 >> 2013-01-03 23:41:43,190 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_m_041101_0 1.0% >>=20 >> 2013-01-03 23:41:43,193 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_m_041101_0 1.0% >>=20 >> 2013-01-03 23:41:43,194 INFO org.apache.hadoop.mapred.TaskTracker: = Task >> attempt_201301031813_0001_m_041101_0 is done. >>=20 >> 2013-01-03 23:41:43,194 INFO org.apache.hadoop.mapred.TaskTracker: >> reported >> output size for attempt_201301031813_0001_m_041101_0 was 111 >>=20 >> 2013-01-03 23:41:43,194 INFO org.apache.hadoop.mapred.TaskTracker: >> addFreeSlot : current free slots : 9 >>=20 >> 2013-01-03 23:41:43,303 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_m_041109_0 1.0% >>=20 >> 2013-01-03 23:41:43,306 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_m_041109_0 1.0% >>=20 >> 2013-01-03 23:41:43,307 INFO org.apache.hadoop.mapred.TaskTracker: = Task >> attempt_201301031813_0001_m_041109_0 is done. >>=20 >> 2013-01-03 23:41:43,307 INFO org.apache.hadoop.mapred.TaskTracker: >> reported >> output size for attempt_201301031813_0001_m_041109_0 was 111 >>=20 >> 2013-01-03 23:41:43,307 INFO org.apache.hadoop.mapred.TaskTracker: >> addFreeSlot : current free slots : 10 >>=20 >> 2013-01-03 23:41:43,361 INFO org.apache.hadoop.mapred.JvmManager: JVM = : >> jvm_201301031813_0001_m_36690963 exited with exit code 0. Number of = tasks >> it >> ran: 1 >>=20 >> 2013-01-03 23:41:43,428 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_m_041106_0 1.0% >>=20 >> 2013-01-03 23:41:43,432 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_m_041106_0 1.0% >>=20 >> 2013-01-03 23:41:43,433 INFO org.apache.hadoop.mapred.TaskTracker: = Task >> attempt_201301031813_0001_m_041106_0 is done. >>=20 >> 2013-01-03 23:41:43,433 INFO org.apache.hadoop.mapred.TaskTracker: >> reported >> output size for attempt_201301031813_0001_m_041106_0 was 111 >>=20 >> 2013-01-03 23:41:43,433 INFO org.apache.hadoop.mapred.TaskTracker: >> addFreeSlot : current free slots : 11 >>=20 >> 2013-01-03 23:41:43,457 INFO org.apache.hadoop.mapred.JvmManager: JVM = : >> jvm_201301031813_0001_m_-2095784622 exited with exit code 0. Number = of >> tasks >> it ran: 1 >>=20 >> 2013-01-03 23:41:43,595 INFO org.apache.hadoop.mapred.JvmManager: JVM = : >> jvm_201301031813_0001_m_1190449426 exited with exit code 0. Number of >> tasks >> it ran: 1 >>=20 >> 2013-01-03 23:41:43,862 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_m_041103_0 1.0% >>=20 >> 2013-01-03 23:41:43,866 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_m_041103_0 1.0% >>=20 >> 2013-01-03 23:41:43,867 INFO org.apache.hadoop.mapred.TaskTracker: = Task >> attempt_201301031813_0001_m_041103_0 is done. >>=20 >> 2013-01-03 23:41:43,867 INFO org.apache.hadoop.mapred.TaskTracker: >> reported >> output size for attempt_201301031813_0001_m_041103_0 was 111 >>=20 >> 2013-01-03 23:41:43,867 INFO org.apache.hadoop.mapred.TaskTracker: >> addFreeSlot : current free slots : 12 >>=20 >> 2013-01-03 23:41:44,021 INFO org.apache.hadoop.mapred.JvmManager: JVM = : >> jvm_201301031813_0001_m_-505301168 exited with exit code 0. Number of >> tasks >> it ran: 1 >>=20 >> 2013-01-03 23:41:45,539 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050235882% reduce > copy = (40975 of >> 271884 at 0.00 MB/s) > >>=20 >>=20 >>=20 >>=20 >>=20 >> Then it seems to get stuck on reduce attempts, before exiting with a >> SIG_TERM (143) exit code... >>=20 >>=20 >>=20 >> 2013-01-03 23:50:49,642 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:50:55,678 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:51:01,717 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:51:04,755 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:51:10,796 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:51:16,831 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:51:19,870 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:51:25,911 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:51:31,953 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:51:34,987 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:51:41,023 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:51:47,063 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:51:50,102 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:51:56,143 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:52:02,179 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:52:05,213 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:52:11,254 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:52:17,295 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:52:20,334 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:52:26,375 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:52:32,411 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:52:35,445 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:52:41,486 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:52:44,526 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:52:50,567 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:52:56,608 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:52:59,648 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:53:05,689 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:53:11,730 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:53:14,764 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:53:20,804 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:53:26,844 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:53:29,883 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:53:35,924 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:53:41,964 INFO org.apache.hadoop.mapred.TaskTracker: >> attempt_201301031813_0001_r_000000_0 0.050278794% reduce > copy = (41010 of >> 271884 at 0.00 MB/s) > >>=20 >> 2013-01-03 23:53:42,616 INFO org.apache.hadoop.mapred.TaskTracker: >> Recieved >> ReinitTrackerAction from JobTracker >>=20 >> 2013-01-03 23:53:42,881 INFO org.apache.hadoop.util.ProcessTree: = Killing >> process group6089 with signal TERM. Exit code 0 >>=20 >> 2013-01-03 23:53:42,881 INFO org.apache.hadoop.mapred.TaskTracker: >> addFreeSlot : current free slots : 12 >>=20 >> 2013-01-03 23:53:43,348 WARN >> org.apache.hadoop.mapred.DefaultTaskController: >> Exit code from task is : 143 >>=20 >> 2013-01-03 23:53:43,348 INFO >> org.apache.hadoop.mapred.DefaultTaskController: >> Output from DefaultTaskController's launchTask follows: >>=20 >> 2013-01-03 23:53:43,348 INFO org.apache.hadoop.mapred.TaskController: >>=20 >> 2013-01-03 23:53:43,349 INFO org.apache.hadoop.mapred.JvmManager: JVM = : >> jvm_201301031813_0001_r_-2077112828 exited with exit code 143. Number = of >> tasks it ran: 0 >>=20 >> 2013-01-03 23:54:28,484 INFO >> org.apache.hadoop.filecache.TrackerDistributedCacheManager: = Cleanup... >>=20 >> java.lang.InterruptedException: sleep interrupted >>=20 >> at java.lang.Thread.sleep(Native Method) >>=20 >> at >> = org.apache.hadoop.filecache.TrackerDistributedCacheManager$CleanupThread.r= >> un >> (TrackerDistributedCacheManager.java:947) >>=20 >>=20 >>=20 >> Just around this time (actually a bit before) the JobTracker log = reports >> the >> following: >>=20 >> 2013-01-03 23:52:31,348 INFO org.apache.hadoop.mapred.JobTracker: = Lost >> tracker 'tracker_hd-37-03.dice.cluster:localhost/127.0.0.1:36005' >>=20 >> 2013-01-03 23:52:31,348 INFO org.apache.hadoop.mapred.TaskInProgress: >> Error >> from attempt_201301031813_0001_m_000010_0: Lost task tracker: >> tracker_hd-37-03.dice.cluster:localhost/127.0.0.1:36005 >>=20 >> 2013-01-03 23:52:31,349 INFO org.apache.hadoop.mapred.TaskInProgress: >> Error >> from attempt_201301031813_0001_m_000039_0: Lost task tracker: >> tracker_hd-37-03.dice.cluster:localhost/127.0.0.1:36005 >>=20 >> 2013-01-03 23:52:31,349 INFO org.apache.hadoop.mapred.TaskInProgress: >> Error >> from attempt_201301031813_0001_m_000041_0: Lost task tracker: >> tracker_hd-37-03.dice.cluster:localhost/127.0.0.1:36005 >>=20 >> 2013-01-03 23:52:31,349 INFO org.apache.hadoop.mapred.TaskInProgress: >> Error >> from attempt_201301031813_0001_m_000089_0: Lost task tracker: >> tracker_hd-37-03.dice.cluster:localhost/127.0.0.1:36005 >>=20 >> 2013-01-03 23:52:31,349 INFO org.apache.hadoop.mapred.TaskInProgress: >> Error >> from attempt_201301031813_0001_m_000092_0: Lost task tracker: >> tracker_hd-37-03.dice.cluster:localhost/127.0.0.1:36005 >>=20 >> Etc etc. >>=20 >>=20 >>=20 >> Can anyone help with this? >>=20 >>=20 >>=20 >> Royston >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >=20