Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 57BF4200B9B for ; Wed, 12 Oct 2016 20:53:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 561B3160AEE; Wed, 12 Oct 2016 18:53:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9CD62160ACA for ; Wed, 12 Oct 2016 20:53:21 +0200 (CEST) Received: (qmail 46980 invoked by uid 500); 12 Oct 2016 18:53:20 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 46963 invoked by uid 99); 12 Oct 2016 18:53:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Oct 2016 18:53:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 971FE2C4C72 for ; Wed, 12 Oct 2016 18:53:20 +0000 (UTC) Date: Wed, 12 Oct 2016 18:53:20 +0000 (UTC) From: "Junping Du (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5718) TimelineClient (and other places in YARN) shouldn't over-write HDFS client retry settings which could cause unexpected behavior MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 12 Oct 2016 18:53:22 -0000 [ https://issues.apache.org/jira/browse/YARN-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569568#comment-15569568 ] Junping Du commented on YARN-5718: ---------------------------------- Thanks Vrushali for quick comments. I think compile error is a bit misleading but indeed an issue need to fix in TestFSRMStateStore (due to a stupid mistake in generating v2 patch). v2.1 should fix the issue. > TimelineClient (and other places in YARN) shouldn't over-write HDFS client retry settings which could cause unexpected behavior > ------------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-5718 > URL: https://issues.apache.org/jira/browse/YARN-5718 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineclient > Reporter: Junping Du > Assignee: Junping Du > Attachments: YARN-5718-v2.1.patch, YARN-5718-v2.patch, YARN-5718.patch > > > In one HA cluster, after NN failed over, we noticed that job is getting failed as TimelineClient failed to retry connection to proper NN. This is because we are overwrite hdfs client settings that hard code retry policy to be enabled that conflict NN failed-over case - hdfs client should fail fast so can retry on another NN. > We shouldn't assume any retry policy for hdfs client at all places in YARN. This should keep consistent with HDFS settings that has different retry polices in different deployment case. Thus, we should clean up these hard code settings in YARN, include: FileSystemTimelineWriter, FileSystemRMStateStore and FileSystemNodeLabelsStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org