Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 34A9E200B81 for ; Tue, 13 Sep 2016 19:49:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 33541160AAA; Tue, 13 Sep 2016 17:49:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7C514160AD2 for ; Tue, 13 Sep 2016 19:49:22 +0200 (CEST) Received: (qmail 90275 invoked by uid 500); 13 Sep 2016 17:49:21 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 89865 invoked by uid 99); 13 Sep 2016 17:49:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Sep 2016 17:49:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id A55342C1B87 for ; Tue, 13 Sep 2016 17:49:20 +0000 (UTC) Date: Tue, 13 Sep 2016 17:49:20 +0000 (UTC) From: "Allen Wittenauer (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (YARN-5635) Better handling when bad script is configured as Node's HealthScript MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 13 Sep 2016 17:49:23 -0000 [ https://issues.apache.org/jira/browse/YARN-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15487889#comment-15487889 ] Allen Wittenauer edited comment on YARN-5635 at 9/13/16 5:49 PM: ----------------------------------------------------------------- bq. does that hold true for even making it an option via a configuration setting? Yes. I don't know how many ways I can tell you that depending upon on an exit code here is extremely dangerous and has proven to be unreliable due to the constantly shifting nature of the state of the node on busy clusters. Throw in all of this "magically expanding/shrinking" task resource management bits that have gone in, and the situation gets even worse. Besides, if you REALLY REALLY REALLY want to do this, all you need to do is wrap your existing health check in something else that, upon failure, prints the ERROR message. was (Author: aw): bq. does that hold true for even making it an option via a configuration setting? Yes. I don't know how many ways I can tell you that depending upon on an error code here is extremely dangerous and has proven to be unreliable due to the constantly shifting nature of the state of the node on busy clusters. Throw in all of this "magically expanding/shrinking" task resource management bits that have gone in, and the situation gets even worse. Besides, if you REALLY REALLY REALLY want to do this, all you need to do is wrap your existing health check in something else that, upon failure, prints the ERROR message. > Better handling when bad script is configured as Node's HealthScript > -------------------------------------------------------------------- > > Key: YARN-5635 > URL: https://issues.apache.org/jira/browse/YARN-5635 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Allen Wittenauer > Assignee: Yufei Gu > > Earlier fix to YARN-5567 is reverted because its not ideal to get the whole cluster down because of a bad script. At the same time its important to report that script is erroneous which is configured as node health script as it might miss to detect bad health of a node. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org