Return-Path: X-Original-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2428CDDD8 for ; Thu, 22 Nov 2012 01:56:59 +0000 (UTC) Received: (qmail 97332 invoked by uid 500); 22 Nov 2012 01:56:58 -0000 Delivered-To: apmail-hadoop-mapreduce-dev-archive@hadoop.apache.org Received: (qmail 97257 invoked by uid 500); 22 Nov 2012 01:56:58 -0000 Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-dev@hadoop.apache.org Delivered-To: mailing list mapreduce-dev@hadoop.apache.org Received: (qmail 97245 invoked by uid 99); 22 Nov 2012 01:56:58 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Nov 2012 01:56:58 +0000 Date: Thu, 22 Nov 2012 01:56:58 +0000 (UTC) From: "Jason Lowe (JIRA)" To: mapreduce-dev@hadoop.apache.org Message-ID: <1277870421.15445.1353549418202.JavaMail.jiratomcat@arcas> Subject: [jira] [Created] (MAPREDUCE-4817) Hardcoded task ping timeout kills tasks localizing large amounts of data MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Jason Lowe created MAPREDUCE-4817: ------------------------------------- Summary: Hardcoded task ping timeout kills tasks localizing large amounts of data Key: MAPREDUCE-4817 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4817 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mr-am Affects Versions: 0.23.3, 2.0.3-alpha Reporter: Jason Lowe When a task is launched and spends more than 5 minutes localizing files, the AM will kill the task due to ping timeout. The AM's TaskHeartbeatHandler currently tracks tasks via a progress timeout and a ping timeout. The progress timeout can be controlled via mapreduce.task.timeout and even disabled by setting the property to 0. The ping timeout, however, is hardcoded to 5 minutes and cannot be configured. Therefore if the task takes too long localizing, it never gets running in order to ping back to the AM and the AM kills it due to ping timeout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira