Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BD6D118319 for ; Tue, 6 Oct 2015 14:19:43 +0000 (UTC) Received: (qmail 3828 invoked by uid 500); 6 Oct 2015 14:19:27 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 3785 invoked by uid 500); 6 Oct 2015 14:19:27 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 3771 invoked by uid 99); 6 Oct 2015 14:19:27 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Oct 2015 14:19:27 +0000 Date: Tue, 6 Oct 2015 14:19:27 +0000 (UTC) From: "Jason Lowe (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-261) Ability to kill AM attempts MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945087#comment-14945087 ] Jason Lowe commented on YARN-261: --------------------------------- Sorry for the late reply. IIRC the original patch implemented a fail attempt rather than a kill attempt because at the time that's all the YARN state machines supported. Back then if an application attempt did not unregister then the only option was to treat it as a failure. If it's easy to add both kill and fail options then that would be great. If it's complicated to implement kill then we can get this fail functionality in and add kill as a followup. Latest patch looks pretty good besides the whitespace and checkstyle nits. One other nit: it would be nice to reuse a constant final saving transtition with the AttemptFailedTransition object rather than a unique one for every time it's needed in the state machine. Also the unit tests don't actually test the most common use-case which is failing an attempt that is running. > Ability to kill AM attempts > --------------------------- > > Key: YARN-261 > URL: https://issues.apache.org/jira/browse/YARN-261 > Project: Hadoop YARN > Issue Type: New Feature > Components: api > Affects Versions: 2.0.3-alpha > Reporter: Jason Lowe > Assignee: Rohith Sharma K S > Attachments: 0001-YARN-261.patch, YARN-261--n2.patch, YARN-261--n3.patch, YARN-261--n4.patch, YARN-261--n5.patch, YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch > > > It would be nice if clients could ask for an AM attempt to be killed. This is analogous to the task attempt kill support provided by MapReduce. > This feature would be useful in a scenario where AM retries are enabled, the AM supports recovery, and a particular AM attempt is stuck. Currently if this occurs the user's only recourse is to kill the entire application, requiring them to resubmit a new application and potentially breaking downstream dependent jobs if it's part of a bigger workflow. Killing the attempt would allow a new attempt to be started by the RM without killing the entire application, and if the AM supports recovery it could potentially save a lot of work. It could also be useful in workflow scenarios where the failure of the entire application kills the workflow, but the ability to kill an attempt can keep the workflow going if the subsequent attempt succeeds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)