Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 22029200C03 for ; Fri, 6 Jan 2017 22:40:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 20997160B37; Fri, 6 Jan 2017 21:40:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7224E160B39 for ; Fri, 6 Jan 2017 22:39:59 +0100 (CET) Received: (qmail 87709 invoked by uid 500); 6 Jan 2017 21:39:58 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 87646 invoked by uid 99); 6 Jan 2017 21:39:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jan 2017 21:39:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 5F7A02C1F54 for ; Fri, 6 Jan 2017 21:39:58 +0000 (UTC) Date: Fri, 6 Jan 2017 21:39:58 +0000 (UTC) From: "Yufei Gu (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-6061) Add a customized uncaughtexceptionhandler for fair scheduler MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 06 Jan 2017 21:40:00 -0000 [ https://issues.apache.org/jira/browse/YARN-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805840#comment-15805840 ] Yufei Gu commented on YARN-6061: -------------------------------- Yes. This sets the default handler if no specific one is set for the thread. But we need a different handlers here. When {{YarnUncaughtExceptionHandler}} got a raw RuntimeException, it just logs an error, didn't bring down the RM. This is fine for some threads, e.g. threads in a thread pool. But for other threads like update thread and preemption thread in fair scheduler, we should bring down the RM once a RTE is caught since there is no way RM still is running but these critical threads are done. I realize that it should work for all critical threads(critical means we should bring down the RM if the thread crashed). Maybe we should enlarge the scope to RM instead of FS only. > Add a customized uncaughtexceptionhandler for fair scheduler > ------------------------------------------------------------ > > Key: YARN-6061 > URL: https://issues.apache.org/jira/browse/YARN-6061 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, yarn > Reporter: Yufei Gu > Assignee: Yufei Gu > Labels: fairscheduler > > There are several threads in fair scheduler. The thread will quit when there is a runtime exception inside it. We should bring down the RM when that happens. Otherwise, there may be some weird behavior in RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org