Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E940917B14 for ; Wed, 28 Jan 2015 16:14:34 +0000 (UTC) Received: (qmail 52616 invoked by uid 500); 28 Jan 2015 16:14:35 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 52535 invoked by uid 500); 28 Jan 2015 16:14:35 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 52316 invoked by uid 99); 28 Jan 2015 16:14:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Jan 2015 16:14:35 +0000 Date: Wed, 28 Jan 2015 16:14:34 +0000 (UTC) From: "Jason Lowe (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAPREDUCE-6230) MR AM does not survive RM restart if RM activated a new AMRM secret key MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-6230: ---------------------------------- Attachment: MAPREDUCE-6230.001.patch Based on [my comment from YARN-3103|https://issues.apache.org/jira/browse/YARN-3103?focusedCommentId=14295216&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14295216] here's a patch to store the token into the *current* UGI using whatever service name the RM set as the key/alias to clobber the existing token and then updates the service name *after* the token has been stored in the credentials. Added a unit test and also manually tested the patch on a secure cluster where the RM rolled the AMRM token master key and then restarted the RM after it activated while the app was still running. Verified that before this fix the AM failed to connect to the RM in that scenario but was able to succeed with this patch. > MR AM does not survive RM restart if RM activated a new AMRM secret key > ----------------------------------------------------------------------- > > Key: MAPREDUCE-6230 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6230 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am > Reporter: Jason Lowe > Assignee: Jason Lowe > Priority: Blocker > Attachments: MAPREDUCE-6230.001.patch > > > A MapReduce AM will fail to reconnect to an RM that performed restart in the following scenario: > # MapReduce job launched with AMRM token generated from AMRM secret X > # RM rolls new AMRM secret Y and activates the new key > # RM performs a work-preserving restart > # MapReduce job AM now unable to connect to RM with "Invalid AMRMToken" exception -- This message was sent by Atlassian JIRA (v6.3.4#6332)