Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 64351 invoked from network); 6 Mar 2007 23:57:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Mar 2007 23:57:47 -0000 Received: (qmail 97258 invoked by uid 500); 6 Mar 2007 23:57:53 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 97229 invoked by uid 500); 6 Mar 2007 23:57:53 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 97213 invoked by uid 99); 6 Mar 2007 23:57:53 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Mar 2007 15:57:53 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Mar 2007 15:57:44 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 49D79714094 for ; Tue, 6 Mar 2007 15:57:24 -0800 (PST) Message-ID: <27856957.1173225444298.JavaMail.root@brutus> Date: Tue, 6 Mar 2007 15:57:24 -0800 (PST) From: "Arun C Murthy (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-1060) IndexOutOfBoundsException in JobInProgress.updateTaskStatus leads to hung jobs In-Reply-To: <2195783.1172882630737.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HADOOP-1060: ---------------------------------- Status: Open (was: Patch Available) Need to double check this patch; the 0.10.1 release works with TASKTRACKER_EXPIRY_INTERVAL set to 3min. > IndexOutOfBoundsException in JobInProgress.updateTaskStatus leads to hung jobs > ------------------------------------------------------------------------------ > > Key: HADOOP-1060 > URL: https://issues.apache.org/jira/browse/HADOOP-1060 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.12.0 > Reporter: Arun C Murthy > Assigned To: Arun C Murthy > Priority: Critical > Fix For: 0.12.1 > > Attachments: HADOOP-1060_20070305_1.patch > > > When the JobTracker detects that a TaskTracker is 'lost' and tries to fail the incomplete tasks and the completed map tasks it fails with: > 2007-03-03 00:38:24,056 ERROR org.apache.hadoop.mapred.JobTracker: Tracker Expiry Thread got exception: java.lang.IndexOutOfBoundsException: Index: 310, Size: 307 > at java.util.ArrayList.RangeCheck(ArrayList.java:546) > at java.util.ArrayList.get(ArrayList.java:321) > at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:342) > at org.apache.hadoop.mapred.JobInProgress.failedTask(JobInProgress.java:862) > at org.apache.hadoop.mapred.JobTracker.lostTaskTracker(JobTracker.java:1637) > at org.apache.hadoop.mapred.JobTracker$ExpireTrackers.run(JobTracker.java:269) > at java.lang.Thread.run(Thread.java:595) > Another instance of same exception: > 2007-03-05 07:44:42,869 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 50020 call error: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 12341 > 215, Size: 83189 > java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 12341215, Size: 83189 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:342) > at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:1611) > at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:1163) > at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1037) > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559) > This means that the tasks aren't updated correctly and the JT just assumes the task is running and never restarts the task... thereby leading to a hung job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.