Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EF5809B7B for ; Wed, 11 Apr 2012 22:17:44 +0000 (UTC) Received: (qmail 43713 invoked by uid 500); 11 Apr 2012 22:17:44 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 43668 invoked by uid 500); 11 Apr 2012 22:17:44 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 43659 invoked by uid 99); 11 Apr 2012 22:17:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Apr 2012 22:17:44 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Apr 2012 22:17:41 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 343503660B0 for ; Wed, 11 Apr 2012 22:17:20 +0000 (UTC) Date: Wed, 11 Apr 2012 22:17:20 +0000 (UTC) From: "Ravi Prakash (Assigned) (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <1036132887.14952.1334182640215.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <939124358.38489.1333124190814.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Assigned] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash reassigned MAPREDUCE-4088: --------------------------------------- Assignee: Ravi Prakash > Task stuck in JobLocalizer prevented other tasks on the same node from committing > --------------------------------------------------------------------------------- > > Key: MAPREDUCE-4088 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 > Affects Versions: 0.20.205.0 > Reporter: Ravi Prakash > Assignee: Ravi Prakash > Priority: Critical > > We saw that as a result of HADOOP-6963, one task was stuck in this > Thread 23668: (state = IN_NATIVE) > - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise) > - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) > - java.io.File.exists() @bci=20, line=733 (Compiled frame) > - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 (Compiled frame) > - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) > - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) > .... > .... TONS MORE OF THIS SAME LINE > - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) > ..... > ..... > - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Compiled frame) > - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 (Interpreted frame) > ne=451 (Interpreted frame) > - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration, java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) @bci=150, line=324 (Interpreted frame) > - org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration) @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame) > - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, java.lang.String, org.apache.hadoop.fs.Path, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 (Interpreted frame) > - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 (Interpreted frame) > - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 (Interpreted frame) > - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) > - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) > - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) > - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, line=530 (Interpreted frame) > While all other tasks on the same node were stuck in > Thread 32141: (state = BLOCKED) > - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame) > - org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter, org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled frame) > - org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted frame) > - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 (Interpreted frame) > - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted frame) > - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame) > - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame) > - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1082 (Interpreted frame) > - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=738, line=249 (Interpreted frame) > This should never happen. A stuck task should never prevent other tasks from different jobs on the same node from committing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira